Google Extended is a control mechanism you can use in your robots.txt file to tell Google not to use your content for training its future Gemini models. It also stops Google from using your content for grounding conversations in Gemini. However, it’s important to know that a Google-Extended disallow in your robots.txt does not block Google from using your content in AI Overviews.

a website robot denying access to a robot with Gemini written on it, in Google colours (made by ChatGPT)

To use it, you can add something like this to your robots.txt file:

user-agent: Google-Extended
Disallow: /

This will tell Google not to use your pages in training for their Gemini models or for use in grounding Gemini answers.

Or,

user-agent: Google-Extended
Disallow: /directory1

This will tell Google not to use a specific directory for training their models or grounding.

Will Google-Extended use in robots.txt stop my content from being used in AI Overviews?

Surprisingly, the answer to this is no. AI Overviews are considered a part of the main Google Search Experience. Google-Extended will not stop Google from using your site in AI Overviews. 

You can block your site or parts of it from being used in AI overviews by using the “nosnippet” meta tag. However, it is important to know that this will block this content from being used in Search as well.

Will using Google-Extended hurt my presence in Search?

Google says, “Google-Extended does not impact a site's inclusion in Google Search nor is it used as a ranking signal in Google Search.” It is simply blocking your site from being used for future training of Gemini models and for grounding in the Gemini app and VertexAI developed apps that use grounding.

What is grounding?

When a user prompts the Gemini app, sometimes the model pulls pages from Search and reads passages to either fact-check or enrich its reply. Then, those pages are shown as references within the app.

If you use the Google-Extended robot block, you will not appear as links recommended within the Gemini app. For example, if Search Engine Land (#3 below) had used the Google-Extended token, Gemini would not have recommended it as a source here:

Google extended blocks you from being used as a reference for grounding Gemini

Grounding with Search can also be used by developers who are building AI products with Google’s VertexAI. Google-Extended use will stop you from surfacing in Vertex-AI developed apps that use Grounding with Search.

Does Google-Extended remove existing content from Gemini’s training?

No. If Gemini has already trained on your content, then it’s already baked into its parameters. The block will exclude your pages from new training runs of Gemini, so you can stop new use of your content. But it can’t un-teach knowledge already embedded into current Gemini versions.

Does Google Extended stop your content from being recommended by the new AI mode?

No. AI Mode is a Search-Labs experiment powered by Gemini. It is a Search product, so it follows the same Search preview controls as AI Overviews - not Google-Extended. If you really want to keep your text out of AI Mode (and out of any Search snippet), add a <meta name="robots" content="nosnippet"> or set max-snippet:0. Google-Extended only governs Gemini’s training and grounding. It doesn’t affect how Search surfaces your pages, so it should not have an impact on your inclusion in AI Mode answers.

What sites should use Google-Extended?

Use the Google-Extended directive when you don’t want your publicly crawable pages fed into Gemini - either for future training or for the grounding step that powers Gemini chat citations, but you still want to let Googlebot index and serve your content in Search. Again, AI Overviews are a part of Search, so Google-Extended will not stop Google from using snippets of your content in an AI Overview.

There are some situations where I think Google-Extended makes sense. For example, if you have licensed, paid or premium content then Search can show a short snippet and you can paywall the rest. You don’t want this content to be used for training AI or grounding so you could block it using Google-Extended.

Another reason to use Google-Extended might be if your IP lies in your words. For example, if you sell essays, fiction, or paywalled research, you may not want AI to train on this. 

Who is currently using Google-Extended?

Many large news publishers use it. According to Reuters, by the end of 2023, 24% of the most widely used news websites were blocking Google’s AI crawler. Including:

I randomly checked the robots.txt of several large, well known sites. 

These sites do use Google-Extended in their robots.txt, blocking Google from using their content for training AI or grounding Gemini answers:

None of these have included Google Extended in their robots.txt:

Thoughts from Marie

I know a lot of site owners whose first instinct is to say, “I don’t want Google to use my content to train their AI! I will block them!” I definitely understand this reaction. However, if your site’s livelihood depends on being found, blocking Google-Extended may do more harm than good. 

More and more people will be using Gemini as a chat interface to get answers. Google Extended will block your site from being used as a source in Deep Research. Google Assistant in phones, wearables, televisions and autos will be upgrading to use Gemini. I personally believe that for many people, Google Assistant will become the primary way they search as AI becomes our daily assistant. You’ll be making it so that these sources are not able to quote you.

I think it’s a tough call for some sites whether or not to block Google’s ability to train on your content. In some cases, if your value is in your unique words and insight, giving that to Gemini could be detrimental. 

However, take a business like mine. I wrote this article you are reading now because I had clients asking me questions I found difficult to answer. I have written thoughts on how Agents will change the web, how AI mode works, and the important changes in Google’s Quality Rater Guidelines. For me, I am thrilled when I see my work quoted by Gemini or ChatGPT. It gives me more exposure, emphasizes my expertise to the world, and potentially brings me new newsletter subscribers and clients.

marie referenced by Gemini

In short, I monetize the trust my writing builds, not the writing itself - so I’d rather keep my content open for AI to use.

Where it gets tricky is when we see Google’s AI products quoting content that originated on your site. It’s important to remember that using Google-Extended does not keep them from using your content in AI Overviews or AI Mode answers. They are like very long featured snippets - a part of Search, not blocked by using Google-Extended.

I really think that for most sites, it makes more sense to allow Google’s AI to train on and use your information for grounding than to block with Google-Extended. 

Still not sure whether you should use a Google-Extended block in robots.txt? Try this GPT

Should I use the Google-Extended Robots token to block Google from using my content for training AI and grounding? - A GPT created by Marie Haynes

I did a bunch of research for this article across a number of sites. I have included this research in the knowledge base of a GPT. If you are not sure whether you should use a Google-Extended robots.txt block, this really should help!

Google-Extended GPT

Here’s a sample conversation:

Google extended chart

what google extended does

Google-Extended-decision

I hope this helps!

If you liked this, you'll love my newsletter...

Marie