Full transcription of John Mueller’s Office Hours Hangout on June 4, 2021.
Q 0:56 – So I wanted to ask, is it really important for Googlebot to be able to see a cookie consent message that we serve to users? Because we kind of decided to serve it upon user interaction, so Googlebot won’t be able to see it. I wonder if that can cause us any problems in terms of Google-friendliness?
A 1:19 – “In general, that should be fine, because Googlebot doesn’t really need to see a cookie banner. Googlebot also doesn’t keep cookies, so it wouldn’t accept cookies anyway, if you gave it to Googlebot. So if you have it set up like that, I think that’s fine. Lots of sites serve the cookie consent banner to Googlebot as well, just because they serve it to everyone. That’s usually also fine. The important part is, essentially, that Googlebot is not blocked from being crawled, or blocked from crawling the website so that you don’t have a kind of an interstitial that blocks the access to the rest of the content.”
Q 2:17 – So I have a directive to use keywords, specifically target keywords, in meta tags, in the H1, use it this many times in a piece of content. And that really just seems outdated to me, especially with all the advances in semantic search and all the cool MUM, and all that other stuff that’s coming down the pipe. I just wanted– basic question– do you think that that’s still a legitimate SEO tactic, or should we not be focused on using this particular keyword this many times on a page?
A 3:09 – “In general, the number of times that you use a keyword on a page, I don’t think that really matters or makes sense. When you’re writing naturally, usually that resolves itself automatically. And also, with regards to the individual keywords, I think that’s something where I wouldn’t disregard it completely, but at the same time, I wouldn’t over-focus on exact keywords. So in particular, things like singular and plural, or kind of like the different ways of writing individual words. That’s something that you probably don’t need to worry about. But mentioning what your site is about and what you want to be found for, that’s something I would still do. So in particular, what we sometimes see when we look at things like news articles, if a news site doesn’t really understand SEO, they might write in a way that is more, I don’t know, almost like literature in that, you read it, and you kind of understand what it means. But the exact words that are used on a page don’t really map to exactly that topic. So that’s something where, from an SEO point of view, if there’s something you want to rank for, I would still mention that on a page. I wouldn’t go overboard with the number of mentions. I wouldn’t go overboard with all of the synonyms and different ways of writing it, but mentioning it at least once definitely makes sense.”
Q 4:47 – I have a question regarding the URL removal tool. So my question is if you use that tool, does it only affect the canonical version of the URL, since I guess this is affecting the index you’re using for publishing your search results. So does it only affect the canonical version, or does it affect the entire duplicate content cluster, which this canonical is a part of? So for instance, if I write in a URL, which, in my opinion, is the one to be excluded, but is basically a noncanonical variant, what happens in that situation?
A 5:44 – “Good question. So we don’t consider the canonical at all when it comes to URL removals, but rather we match it one-to-one exactly as you submitted it. And we include the HTTP, HTTPS, and WWW, non-WWW versions of that URL. And essentially, what happens is we don’t remove it from our index. So the whole indexing side stays the same. The crawling side stays the same. We just don’t show it in the search results. So essentially, you would submit the URL that you see in the search results, and then we would just hide that from the search results.”
Q 6:33 – My question relates to templated content above the fold from a mobile-first perspective. And this relates to a blog restructuring we’re working on, where we’re moving from all of our blog articles living in a blog subdirectory to building out topic clusters with specific topic landing pages. And based on initial designs of the topic landing pages, they’re all going to include the same hero banner with the same header and then the same two sentences, and then below that will be a sub-navigation to let the user navigate to a specific topic landing page. And my concern is, from that mobile viewport, each of these topic landing pages, they’re unique URLs, unique titles, but they have that same header and just above the fold content. But right below that is a unique subheader, which would be like the H1 tag of the page. So I just want to get your thoughts there. I know Google in the past has said that they weigh above the fold content, and I think it’s something important for us to look at. But is it detrimental to non-brand rankings for those topic landing pages?
A 7:58 – “So the important part for us is really that there is some amount of unique content in the above the fold area. So if you have a banner on top, and you have a generic URL image on top, that’s totally fine. But some of the above the fold content should be unique for that page. And that could be something like a heading that’s visible in a minimum case, but at least some of the above the fold content should be there. So that’s kind of the guidance that we have in that regard.”
Reply 8:34 – OK, cool. Actually, maybe I’ll just leave the header the same and change a sentence or something based on the topic.
John 8:40 – “Yeah. I mean, it’s probably also something where you want to look at how users interact with those pages afterwards. But that’s kind of more from a non-SEO perspective. But it’s always, I think, important for cases like that that you take a look and see what actually happens with the users afterwards.”
Q 9:15 – I have a question regarding the domain age, and does the domain expiry matter? Suppose I have a domain that is a couple of years old, and it will expire in the next six months. So does DSU check the age of the domain?
A 9:33 – “No. I don’t think we use that at all.”
Q 9:40 – I have one more question on the domain only. So do you check if the WHOIS is private or public?
A 9:48 – ”I don’t think so, at least not that I know of. Because it’s also something where some of the top level domains and the registrars, they just have everything private by default. So it’s like you wouldn’t be able to compare that anyway or use that for anything useful.”
Q 10:06 – So John, so I am a little bit worried right now. So what happens is like now both of us are talking so we can clear out things. So there is a lot of crap and wrong information and data spread widely across the internet. It is copied, and the content length keeps on increasing. Let’s suppose one SEO expert in 300 words on on-page SEO says 10 top points that you should do. The next SEO expert comes. He writes 20 points, and the content length increases to 100,000 words. Then the next expert comes. There are only five, six people only, if you search on Google, they will come on. They are the main culprits who are spreading all the wrong information. Then the next one comes. He writes 2,000 words of content, and he lists 20 points. And another person comes, he writes 200 Google algo points and he has 4,500 words. And he claims, also, like I have been increasing my word count, and Google has increased my rank from number three to number one. So it is not directly my question, but how are the people that are using the skyscraper technique to build a bigger building than, suppose another tower? The next person is writing, again, crap. There is no useful information. His building is bigger. Then another is even bigger. And they are like, if I want to see real information, then the actual gurus are also spreading misinformation. But still, they’re competing around the five/six gurus, and many people are taking their courses and on and on. So in SEO, suppose I want to write something true. So we are discussing. I am writing, and in whatever we discuss, in 200, or 300, or 500 words, I write the real answers. So I think maybe with new Google algorithms, this is my suggestion, should Google try to read also the content, like the meaning of the content, if the meaning of the content is something like ‘domain expiry does not matter?’ But they explicitly like, maybe they’re an affiliate that your domain should expire after three years, keep on renewing so that the expiry is longer. So Google checks if a genuine website will have longer expiry. So they will have an affiliate link below. So they want to sell the psychology and the link, but this is the wrong information. This is my concern only to share with Google. You can advocate on it.
A 12:59 – “Yeah. I think it’s always tricky. And there is also the aspect of sometimes there is just old and outdated information out there on specific topics. But yeah, I don’t know. People like to write a lot of things and to promote the things that they’ve been working on. So it’s hard for me to say.”
Q 13:29 – Why still do the skyscraper techniques seem to work? The [INAUDIBLE] example, like three months ago, I was using a piece of content with 500 words. And suppose on on page SEO, they will give [INAUDIBLE] take you one example. On page SEO, I have 500 words, and I used to rank at number six. Then I increase it to 1,000 words. I move to number five. Then as he slowly increases, he naturally climbs the ladder up to number one.
A 13:56 – “I don’t think that there is really a correlation there. Sometimes these things happen that you rank higher when you make changes when you improve things. But just purely from increasing the number of words on a page, that’s absolutely not something that we take into account.”
Q 14:14 – Sometimes quality matters. So in the next algo, will the page experience see the number of DOM? Suppose you write an article with 2,000 words, very good, relevant. I write again with 2,000 words. So my DOM will increase dom size.
Q 15:24 – My question is related to Google Search Console, and it has more to do with the sampling of data. So what I’m stuck with and am confused with, so in the main property of Search Console, if I create a subproperty, what I see is that the subproperty’s numbers, the clicks, if I take from the main property, it is only 4,000 clicks. But when I create a subproperty, I see 40,000 clicks for that particular subfolder. So when I’m seeing and analyzing the numbers at the main property level, I might skip out that portion of that subfolder because I see only 4,000 clicks, and maybe these other site sections are doing great and all. But when I create a subproperty, it gives me a huge number, which is a 10 times difference. So then what I did eventually, I created many subproperties inside Google Search Console to get more numbers, which gives me a better picture that, OK, this subfolder is doing great. It’s actually not only 4,000 or 2,000. It’s actually 20,000 or 30,000. So what should I do? Should I continue pulling numbers from subproperties, or should I go to the main property and consider the numbers from there? Because if I add up all the subproperties, the total goes beyond the numbers which I see at the main property.
A 16:47 – “Yeah. I think it depends on the size of your site and the way that the traffic comes to your site, because there is a limit of the number of data points that we collect per site per day, and perhaps that’s what you’re seeing there. So from my point of view, if you’re seeing more useful numbers or more actionable numbers that way, I think that’s totally fine. I don’t know if that’s something that the average site needs to do. My guess is most sites don’t need to do it, but maybe it makes sense for your site.”
Reply 17:36 – Yeah, but the only thing is the concerning part is that, if I continue pulling the numbers from the subproperty, that’s fine. But when I look at this main property, if I have a look at the overall numbers, then it gets a little confusing, because the subproperties are speaking a different story. And the main property, the total overall website numbers are saying a different thing.
John 17:48 – “Yeah. I mean, you’re looking at a different database there. So from that point of view, comparing those different subproperties, that probably doesn’t make sense. But if you need to look at more detailed information within one of those subsections of your website, then maybe that does make sense to look at it on that level.”
Reply 18:11 – OK. So you would recommend that it’s case to case?
John 18:15 – “Yeah. I mean, I think for most sites– like, we chose those numbers to make sure that, for most sites, the data was sufficient and useful and that the totals kind of line up. But there are definitely individual cases where it makes sense to dig more into detail and kind of verify subsections of a site.”
Q 18:39 – Would we see anything in the future for a premium version of Google Search Console, where we can pay and get more data.
A 18:50 – “I have no idea. I mean, it’s something where we tend not to talk about what is lined up for the future. I don’t know. It feels weird to have a premium version of Search Console for something like that, but who knows. I think seeing what people are doing with Search Console and seeing where they’re running into limitations, that’s always useful for us.”
Q 19:29 – So my question is, for Core Web Vitals, me and my team have been preparing and making optimizations, whether it’s theme-related in CSS. So what we have witnessed is the data in Google Search Console, it’s completely different to when we go to the incognito mode of the website and go to Inspect, and go to Lighthouse, and generate a report from there. So over there, I can see 91 performance, and everything is good. Largest Contentful Paint, time to interact, everything is green. However, if I go to Google Search Console, I can see 100 poor URLs. So we are in a bit of a situation where we don’t know where to go from here. So if you could help with that.
A 20:15 – “OK. So I think the important part is these are different ways of looking at the data. And so we differentiate between the lab data, which is kind of like, if you test it directly, which is what you would see in Page Speed Insights or in Lighthouse when you look at it in incognito in your browser. And the other variation of the data is the field data, which is what users actually see when they go to your website. And that’s the data that’s shown in Search Console. And from there, the differences are essentially that the lab tests that you do, they make a lot of assumptions with regards to, probably users will have this kind of device, or this kind of connectivity, or have this kind of configuration. And they will try to kind of use those assumptions to figure out what it might be. But what they actually see in practice could be very different. And that’s probably what you’re seeing there.”
Q 21:22 – OK. So having said that, Google would consider the practical data over the lab data to run or to rank the pages?
A 21:29 – “Yes. Yeah. So in Search, we use the field data from real users. There’s one tricky thing with the field data in that it takes 28 days to update in Search Console for various reasons. What you can do is use something like Google Analytics together with an extra script on your site to track the field data yourself, as well. So then you would have the lab data, your own field data, and the search field data, and usually, you would see kind of the differences between the lab and the other field data.”
Reply 22:12 – Right. So our aim should be optimizing for the field data?
John 22:16 – “Yes.”
Q 22:27 – I know Googlebot doesn’t like cloaking on a site, but that’s essentially what happens when you use Google Analytics A/B testing, for example, because it reaches the content underneath your web when it does the A/B testing. So anything that optimizes, it’s optimized for SEO cannot use Google Analytics A/B testing. Is that right?
A 23:04 – “Kind of. So the important part for us with A/B testing is that the A/B testing is not a permanent situation and that the A/B testing is something where Googlebot essentially also falls into that A/B test. And essentially, we can kind of see what users are seeing. And when it comes to A/B tests, the important part for us is also that the purpose of the page remains equivalent. So if you’re A/B testing a landing page and you’re selling one product, then it shouldn’t be that, instead of selling a car, suddenly you’re selling a vacation, or a flight, or something like that. It should be something where the purpose of the page is the same so that, if we understand this is a page that has a car for sale, for example, then we can send users there. And if the page looks slightly different when users go there, that happens. Sometimes you personalize. Sometimes you do A/B testing. All of that is essentially fine. So the A/B testing that you would do with Analytics usually is in that area where you say, ‘oh, I am changing my call to action. I’m changing the colors, the layout slightly.’ But you’re not changing the purpose of the page.”
Reply 24:28 – Yeah, totally. We are not changing the purpose of the page, but we do plan to run some UX enhancements through the experiment. But it’s kind of risky, you know.
John 24:42 – “No, it shouldn’t be. Especially if the purpose of the page remains the same. Then that’s something where, even if Googlebot were to see both of the A/B versions, then we would be able to index a page normally. It wouldn’t change anything else. So that’s perfectly fine. The cloaking side that is more problematic is if I don’t know, you’re selling a car, and then when Googlebot looks at it, it’s showing a car. When a user looks at it, it’s going to a pharmacy. That’s usually kind of the more spammy cloaking that we worry about, where the webspam team would get involved. And all of these subtle changes, also if you have device-specific changes on a page, that’s perfectly fine for us.”
Reply 25:30 – Maybe for mobile? We use AdSense, for example, and maybe for mobile, we would stop showing some ads, as well.
John 25:38 – “Sure. Exactly, yeah.”
Q 25:42 – Regarding this same thing, if ranking goes down because of A/B testing, is it a good sign that Google is not finding that different version? And if we roll out an older page, then the ranking should come back?
A 26:04 – “I don’t know. My assumption is the ranking would not change with normal A/B testing, because you have the same content. Essentially, the purpose of the page remains the same. If you were to significantly change the page, like remove all of the textual content in the B version and Googlebot sees that version, then that’s something where the ranking might change. But for the normal A/B testing, I don’t see that as being problematic. And if the ranking did drop, then I would, as a first assumption, assume that it’s not related to the A/B testing, but rather may be related to something else.”
Reply 26:47 – Got it. So ranking will not have any fluctuation?
John 26:51 – “I mean, there are always fluctuations in ranking, but it shouldn’t be just because of the A/B testing.”
Q 27:07 – My question is, we are a WordPress site, so we have a blog section. We have also a services section. Our services sections are labeled as pages on the WordPress back-end, whereas our blog sections are labeled as posts. Our services section gets a lot of traffic, but our blog section does not get comparable traffic. So is it because Google treats pages more favorably than blogs, or maybe we are missing out on other fronts, like blog marketing?
A 27:38 – “I don’t think Googlebot would recognize that there’s a difference. So usually that difference between kind of posts and pages is something that is more within your back-end, within the CMS that you’re using, within WordPress, in that case. And it wouldn’t be something that would be visible to us. So we would look at these as it’s an HTML page, and there’s lots of content here, and it’s linked within your website in this way. And based on that, we would rank this HTML page. We would not say, oh, it’s a blog post, or it’s a page, or it’s an informational article. We would essentially say it’s an HTML page, and there’s this content here, and it’s interlinked within your website in this specific way.”
Q 28:28 – So my other question is, what is happening is that the blog section has a longer URL. The root URL slash blog slash categories slash the article name. So is it because the URL is longer, and it is getting harder to rank the blog?
A 28:44 – “It shouldn’t be, no. I think, I mean I don’t know your website, so it’s hard to say. But what might be happening is that the internal linking of your website is different for the blog section as for the services section or the other part of your website. And if the internal linking is very different, then it’s possible that we would not be able to understand that this is an important part of the website. But it’s not tied to the URLs. It’s not tied to the type of page. It’s really kind of like, we don’t understand how important this part of the website is.”
Q 29:22 – OK. So how important is the length of the blog? We are following the guideline of 300 plus words, and recently, I’ve read in many places that Google favors long-form content. So maybe are we missing on that front? We are writing shorter blogs?
A 29:35 – “We don’t use the word count at all. So the number of words in your articles, that’s totally up to you. I think some people like to have a guideline with regards to the number of words, but that’s really an internal guideline for you, for your authors. It’s not something that we would use for SEO purposes.”
Q 30:08 – I have a question about Google Search Console. My website has around 120 external links, and about 40% of them are non-working Japanese domains. I have no idea where they came from, and what do I have to do with them?
A 30:33 – You probably don’t need to do anything with them. If these are just random links from the internet, I would just ignore them. It’s not specific to Japanese links, but sometimes spammers include normal URLs within the URLs that they use to promote spam, and that means on random forums and blogs, they will drop these URLs as well. Sometimes that ends up with a lot of links that are posted in non-English or in foreign language content. I’ve seen that a lot with Japanese, Chinese, Arabic, all kinds of languages.
Reply 31:20 – And the almost 50% count doesn’t matter?
John 31:25 – It doesn’t matter.
Reply 31:27 – So I shouldn’t be worried that I get some kind of penalty for that?
John 31:31 – Yeah, no. I mean, if these are not links that you placed, things that you bought where you put these links there, then I would just ignore them.
Reply 31:41 – Yeah, what if my competition was the one buying those links for me?
John 31:46 – I wouldn’t worry about it.
Reply 31:47 – I mean, they are not, but you wouldn’t worry?
John 31:50 – No. Those are the kind of links that, on the one hand, if they don’t exist anymore, then we will ignore them over time anyway. So that doesn’t matter if you disavow them or not. If they don’t exist, they don’t have any value. On the other hand, these are the kind of links that we see so often, since 10, 20 years and we just ignore them.
Reply 32:19 – But why do you show them in Google Search Console?
John 32:23 – Just because we want to show you everything that we know about. In a case like this, it’s like “oh, if Google knows these are stupid links, then maybe they shouldn’t show them.” but in Search Console, we try not to make a judgment with regards to the links. We also include things that are nofollow and all of that. We show them to you, but if you look at them and say “oh, these don’t matter for me” then you can just ignore them.
Q 33:17 – Let’s say I have 500 physical shops that are selling my products, and I want to create, for each of them, a specific landing page. Would this be considered doorway pages?
A 33:27 – No, that would be essentially fine. It’d be like having different products, because these are unique locations, these are physical locations. Having individual pages for them is perfectly fine. Sometimes it might make sense to combine these and put them on a shared page, such as if you have a lot of shops in specific countries. Maybe just list the shops there instead of individual pages per shop but that’s totally up to you.
Q 33:56 – From a ranking point of view, does Google treat nofollow UGC and sponsored rel attributes any differently?
A 34:05 – We do try to understand these and try to treat them appropriately. I could imagine in our systems that we might learn over time to treat them slightly differently. In general, they’re all with the same theme in that you’re telling us these are links that I’m placing because of this reason, and Google doesn’t need to take them into account.
Q 34:31- How might a poor site structure affect indexing? Some of my older articles are getting de-indexed, but these were articles on page eight to nine of my blog. I’ve since created static category pages that make it easier for Google to keep track and to find these pages. What could be the indexing issues here?
A 34:59 – It’s certainly the case that we don’t index all content of all websites on the web, so at some point, it can happen that we say “oh, these pages are not critical for the web.”. Maybe we’re not showing them in the search results, maybe just generally we don’t think these are critical for the web, so maybe we will drop them from our index so that we can focus more strongly on the important pages within your website. That’s something where we use the internal site structure quite a bit to try to figure out that, if we understand that your site is important and we see, within your website, that you’re telling us this part of my website is important for you, then we will try to focus on that. Whereas if you’re saying these pages are also on my website and they’re linked, like you mentioned here, like eight to nine pages away from essentially your home page, then we might say “well, you don’t think these are important, so maybe we won’t focus so much on them. We’ll focus on the other ones.” That’s something where your internal site structure can help us a little bit to understand that, but it’s definitely also the case that we just don’t index everything on all websites.
Q 36:14 – How much of an impact does disavowing links in bulk make? We recently disavowed many backlinks on our site. They were all HTTP sites with low domain authority. Many of them were comments with a link back to our site however, we haven’t seen any positive improvement. There are currently no manual actions against us. Is disavowing not necessary these days?
A 36:38 – You could probably save yourself the effort of disavowing those links. Probably, they wouldn’t have any effect at all. I would mostly use a disavow links tool really for cases where you either have a manual action, or where you look at what you’ve done in the past, and you realize probably I will get a manual action if anyone from Google sees this. In those cases, I would go off, and try to disavow that, and try to clean that up. Just generally, links from low domain authority sites or links that essentially are part of the web since years and years, I would not disavow those. I don’t think that makes any difference at all.
Q 37:37 – Some questions around the sources behind Google News.
A 37:33 – I don’t really have much insight into the Google News side. If you have questions about Google News, I’d recommend going to the News publisher forum and posting there. I don’t really have much insight on the labeling and how kind of things have to be set up for that.
Q 37:55 – Does Googlebot still heed pagination and breadcrumbs, or does it affect the ranking? What’s the best practice?
A 38:04 – We use pagination and breadcrumbs as a way of understanding the site’s internal structure a little bit better. That’s something that definitely still plays a role with regards to crawling and indexing a site. If you have pagination and breadcrumbs on your site and you remove that, then that does mean that the internal structure of the site is now different. A good way to kind of double check how those factors play a role within your website is to use an external crawler, some kind of a tool that you can run across your website to kind of crawl the website the way that it is now, and then you could analyze from there, are these breadcrumb links the reason why these pages are being crawled or not? Then based on that, you can make decisions on whether or not to remove them, or to change them, or however you want to kind of modify things within your website.
Q 39:05 – In a scenario where a group of URLs have been changed, but for some reason the 301 redirects have not been set up right away, roughly how long is the time frame that you have to implement redirects to transfer the ranking authority from the old to the new pages and prevent ranking drops?
A 39:25 – It’s tricky, because there is no specific time for this, especially because there are different variations of this kind of problem situation that you have here. In particular, if the old content still exists and you’ve created a copy of that on a new URL, then in a case like that, we will treat those two URLs as being a part of a same cluster, and we’ll try to pick a canonical URL between those two URLs. It can happen that we switch over to your new URL for that. If that’s the case, then, essentially, we will forward all of the signals from the old URL to the new URL automatically, even without a redirect in place. In that scenario, probably you will not see a big difference if at some point later on, you add a redirect. The main difference you would see is that it would be a lot clearer for us that you want the new URLs to be indexed and not the old URLs. In that set up, probably you wouldn’t see a ranking change, but probably you would see that we would switch over to the new URLs a little bit more consistently. In a situation where you delete the old URLs and just add the same content somewhere else on your website, then that’s something where we would essentially, in a first step, lose all of the information we would have about this page, because suddenly it’s a 404 and we would treat the new page as being something new. We would essentially say “well, there’s a new page here, and we would not have any connection between the old page and the new page.” and that’s something where, at some point, we will drop the old page from our index and lose all of those signals. If you wait too long and add a redirect much later, then those signals are already gone and that redirect is not going to forward anything anymore. That’s kind of, in that situation, where if you delete things and just move it somewhere else, then probably after a certain period of time, depending on the website, you would not see any improvement from adding redirects. In a case like that, it would, from my point of view, still make sense to start adding redirects there, just so that you’re sure that, if there is any small value still associated with those old URLs, then at least that is still forwarded over. Those are kind of the main scenarios there.
Q 42:02 – Can you tell us something about the new Lighthouse Treemap?
A 42:06 – I haven’t actually taken a look at the Lighthouse Treemap, and probably for questions around Lighthouse, it would make more sense to check in with the folks from the Chrome side and kind of pick their brains on that, on what you should be focusing on there. I saw some screenshots, and they look pretty cool, but it’s like all of these speed-related topics, they can get pretty complicated.
Q 42:36 – Some insight into the optimization of free listings on Google Shopping? If we manually edit a description and title to match keywords that we’re interested in in positioning our listings for, will we be penalized if these keywords do not appear on our site? For example, edit the free listing to include the word “low cost” or “cheap” in the description of a product like a gold-plated ring, where the words are not referenced on the reference domain for that listing.
A 43:07 – I don’t know. My feeling or my understanding is we do try to map the landing page with the products that you have in your Merchant Center feed. If they don’t align, then we will have to make a call with regards to which of these versions we actually use. That’s something where I would generally recommend trying to make sure that these two versions align as much as possible so that you don’t give Google this situation where you’re saying, this data doesn’t really match, which version should we take into account, or should we ignore half of your data completely. Essentially, if you can give the data in a clear way such that Google can actually take that into account immediately, then usually that’s a lot better in the long run than just short term tweaking individual keywords, where you probably don’t see much value out of that anyway.
Q 44:12 – Which anchor link does Google value? Does Google value the size of the anchor text or the size that the anchor text takes up on the screen?
A 44:28 – I don’t think we have that defined at all. Those are the kind of things where, from our systems, it can happen that we pick one of these, and we try to understand which one is the most relevant one. I wouldn’t make the assumption that we just naively take the first one on a page and just only use that, or only take the one that has the longest anchor text and ignore the other ones. We essentially try to understand the site structure the way that a user might understand it and take that into account and the way that this ambiguous situation is handled, that can also vary over time.
Q 45:12 – We’re a news site, and we’re thinking about implementing AMP, but Google announced that AMP is not necessary to rank in the Top Stories carousel, and the AMP badge will be removed. So my understanding is that we need to focus on Core Web Vitals and fine tune our website to make it fast and create high-quality content. Can you give us some more information on getting into the Top Stories carousel?
A 45:36 – Yes, we did announce that AMP is no longer required for the Top Stories carousel, and instead, we will be focusing on things like the Core Web Vitals and the page experience factors to try to understand which pages we should be showing there. I think the important part with AMP is that it’s a really easy way to make pages extremely fast and to kind of make sure that you’re in an easy way, almost by default, achieving the metrics for the Core Web Vitals. That’s something where, if you’re trying to make your pages fast and you don’t know which framework to use, and maybe AMP has a good approach there, maybe that’s also something where you can take individual elements out of AMP and just reuse those on your pages, and then, over time, migrate more of your pages to the AMP framework. I would see it more as a framework, rather than a feature that you have to turn on or turn off. With that in mind, there are other ways you can make your pages really fast. You don’t have to use AMP, but sometimes using AMP is an easy way to do that, especially if you have something like WordPress. If your site is built on WordPress and you can just enable the AMP plugin, then sometimes that can kind of automatically shift your site over into the good side of the Core Web Vitals.
Q 47:36 – I was talking to an SEO and he was saying that with this kind of website (a job website), there are so many similar websites. What happens is that one website has the structure where, in India, it just shows one generic page, asks the user to forcefully give the city, and then redirects that particular user to their respective city jobs. The SEO I was speaking to was discussing whether this will be cloaking or not for Google, because to Google, they had the structure that, in the USA, they will be serving the page as it is, more forcefully as country/city selection. From my understanding, it was for different country specific domains. We can redirect users to respective countries, and for Google, we are allowing crawling. I was thinking that this can also not be cloaking from Google’s side, if we are providing one page to Google but for users in the same country, we are just asking for the city and redirecting them to that respective city. Please give me your ideas on this.
A 49:10 – The important part for cloaking in a case like this is that Googlebot sees the same content as other users from the same country where Googlebot is crawling from would see. Most websites would crawl from the US so if a user in the US were to access this website, they would see the same thing as Googlebot would see. That’s kind of the general assumption. The extra situation, like in your case, where maybe in India or in some other country, they’re doing something slightly different to personalize the experience for users in that region, from our point of view, that’s perfectly fine. It’s really just a matter that Googlebot sees the content that other users would see in the US, and if you’re doing something more unique in a country where Googlebot is currently not crawling from, that’s perfectly fine and up to you. The important part here is, I think, also to keep in mind that Googlebot will be crawling from the US. So if, for example, you have different jobs in different countries, and when a user from the US comes to your website, they only see the listings for the US, but most of your users are based in India, then we will index your homepage, at least, as looking like a Jobs website for the US. If someone is looking for a job in San Francisco, then we would say, “oh yeah, here’s one on this website.” but if someone is looking for a job in Mumbai or some other place in India, then we might not necessarily know that the homepage of this website would be relevant, because we never see jobs from India listed on the homepage. On the lower level pages of the site, usually there’s less personalization there, but especially with the home page, that’s one area that you kind of need to watch out for if you’re doing a lot of personalization. My recommendation there would be to say you have one part of the homepage that is very personalized, that kind of matches where the user’s location is, and another part of the homepage is essentially very static, something that is something that all users would see. Then you can kind of guide Google to index the homepage and say it’s like “there is some US jobs information here, but there’s also some worldwide jobs information here.” and the homepage is kind of relevant for all of that.
Reply 51:47 – John, regarding this cloaking, I was also going through Google Documents, and I also personally feel that there should be more content available, because from cloaking, there is very less content available where we are saying that, if you are having different behavior for users and different for Googlebot, then it could be cloaking.
John 52:14 – That’s good feedback to have. What I would recommend doing in a case like this is trying to find the most related page within our documentation and using the feedback link there to let us know that it would be useful to have more examples or more details there, but I can also pass that on to the team. The feedback link is kind of the best way to get the information directly to the team.
Q 53:30 – In a previous question about keywords, you mentioned how it’s important that the writing is good, that people use plurals correctly. It made me think about a headline I saw a few days ago from a major outlet that read so-and-so announces they’re engaged to so-and-so. I read it, and I said, good for that outlet for correctly referring to that particular non-binary individual as “they.” Then of course, my mind immediately went to SEO and wondered whether Google understands that this isn’t poor grammar, but rather inclusive English. With Google’s natural language processes, does it understand a plural followed by a singular form of verb, like “they is doing this” or “they is doing that,” that that’s grammatically correct?
A 54:14 – I don’t know. Probably. I mean, it’s something where usually, our systems would learn this automatically. We would not manually define English grammar to be like this. I could imagine that especially these kinds of shifts in language, where over the years, this becomes more and more common, that’s something that probably takes a bit more time for our systems to learn automatically. Probably if we were to run into situations where we obviously get it wrong, and we receive feedback about that, then I could imagine that our researchers would say, “oh, then it’s like, on the web overall, this is kind of rare still but it’s important, so we will try to tune our models to also deal with that properly.”
Reply 55:13 – Yeah, I was wondering whether that was affecting the way that that particular story ranked, because it was a subject about one person. Other people just used the person’s first and last name to sort of get around it.
John 55:25 – Yeah, I don’t think that would be affecting the ranking there, because if we’re taking that headline apart, we would pick up the two individuals, and we would focus on like, “oh, these two are now related or mentioned in the same headline kind of thing.” but the individual words there are probably less critical for us.
Q 56:20 – A question about links, but not ranking related. Don’t worry. The scenario is Googlebot crawls a page and gets links. Those links are pointing to somewhere else on the internet or internally. The question is, is there a span of time after which, not crawling a specific page, you delete the links contained in that page, let’s say, from the linkref, or you keep those links until Googlebot crawls again that page? Just to clarify, I mean the links that are contained on the page and not the links that are pointing to the page.
A 57:03 – OK, so kind of like if we would ignore the link if we haven’t recrawled the page in a period of time, or–
Reply 57:11 – Yeah. If you crawl a page, for example, a year ago, and after one year, you stop using the links on that page because you didn’t crawl the page again.
John 57:24 – Yeah, I don’t think so. I think we would continue to use those links. What would probably happen is the rest of the web and kind of that website would evolve over time, and it could happen that those links are less relevant after some period of time but we would still keep them. For example, if you have a link in an article on “The New York Times” and it’s currently an article on the home page, then that link might be seen as very relevant for us but five years later, that link is within an archive section of “The New York Times” and somewhere hidden away in the cellar. Then that link would still be there, but it’s a lot less relevant within the context of “The New York Times.” Those are the changes that tend to happen but it’s not that we would say, “oh, we haven’t looked at this link in five years.” therefore, we will ignore it.