Did you know that it is possible to tell whether or not Google has disavowed a link that is in your disavow file?  This is HUGE news.  John Mueller from Google has told us several times that some links can take up to six months to be recrawled and subsequently disavowed.  What this means is that even if you have done a thorough disavow, if Penguin refreshes before the majority of your links in your disavow file have been recrawled then the Penguin algorithm will still look at your site as untrustworthy.  This is why some sites need to see two Penguin refreshes in order to escape the Penguin algorithm.  But, if we can tell whether our links have actually been disavowed, then perhaps we can know whether we’re ready for a Penguin refresh.

In this article I’ll explain how you can tell if a link has been disavowed and I’ll also share with you some very interesting information from some testing that I did.  I tested to see if it really did take 6 months for links to be disavowed and in the process I made a discovery that might be a reason why some sites never escape from Penguin or other link related penalties/algo changes.

 

Update: A lot has changed since 2014! In 2016, Google launched Penguin 4.0. With this update Google started to ignore super spammy links instead of counting them as a negative towards your site. We do not do nearly as much disavowing as we used to at the time this article was written.

We do feel that there is still valuable information in here in regards to how Google processes a disavow, but we would like to emphasize that most sites that have an onslaught of spammy links do not need to disavow. If you are not sure whether a disavow is necessary for your site, you may be interested in our link overview plan in which an MHC link auditor will assess your risk level.

A cached link is a disavowed link

I asked John Mueller a question in a Webmaster Central Forum hangout recently.  What I wanted to know was this:

If a site is in my disavow file and I see that that site has been recached since I disavowed can I assume it’s been disavowed?

His answer surprised me.  I figured he was going to say that the cache date was not at all connected to the date when the link was disavowed, but actually it is!

John said that every time they crawl a url, if it’s in your disavow file then the disavow gets applied. (When a disavow is applied it is essentially the same as Google applying an invisible nofollow tag to the link that is pointing to you.) So, if you have disavowed the home page of a site and you check the home page and see that it has been recached then you can assume your disavow has taken effect. The same thing applies for individual pages. Now, John says it’s possible for the Google index to be updated and the page NOT to be cached. But, if you do see that the page is cached then your link really should be disavowed.

In other words:

a) If you see that the page has been cached on a date that is AFTER the date that you disavowed the url or domain then your link is disavowed.

b) If you don’t see a new cache since your disavow was filed then there’s still a chance that the page has been revisited and the disavow applied, but it’s possible that it has not yet been disavowed.

It’s important to note that if you disavowed the entire domain, it’s not just the home page that has to be recached but rather the actual url that contains your link(s) that need to be recached in order to know that your link has been disavowed.

John was asked if there is any other way that we can check if our link has been disavowed? He said that the cached page is a reasonable way to check. Another way would be if the actual pages have a date stamp on them then you can do a site: query and search by date to see if Google has updated the index.

How can you tell if a page has been cached?

To tell if a page has been cached, what you do is search for the url (in quotes) in Google and then click on the little green arrow that appears after the url in the Google search results.  You can then click on “Cached”:

Then, you’ll see the cache date in the grey box at the top of the page:

In the example above, the page was last cached on Jun 21, 2014.  What this means in regards to our disavow file is that if this were a link that I had disavowed, and I had filed my disavow prior to June 21 I can consider this link disavowed.

You can also check cached urls in bulk using a tool such as Scrapebox’s Google Cache Extractor Addon.  I did just that as a little experiment.

I did some testing – were my clients’ links actually disavowed or were we still waiting for the links to be revisited and cached?

This next part of this article is interesting.  I did a bunch of tests to see how I could benefit from knowing the cache date of urls that I had disavowed.

Does it really take 6 months to revisit a link?

John Mueller has said in the past that it can take up to 6 months for a link to be revisited and disavowed.  What I did was take a look at five of my past penalty removal clients who had very spammy link profiles.  My assumption was that the spammiest of links would be the ones that rarely got recrawled.  I ran all of their urls through Scrapebox’s Google Cache Extractor Addon.  For a large number of the links, no cache of the page was found.  I’ll write more on this later on in this article.  But, for the remaining links I looked at the cache dates and extracted some interesting information:

Oldest Cache Date

The longest time that a link has gone without being cached was March 27, 2014 which is under 3 months ago (it is June 23 as I write this.) This means that if this link had been disavowed in April or May, it still would be considered a live link by Google’s algorithms because Google has not yet revisited this url to apply my disavow.

Average Age of Cache Dates

  • 0% of the urls had a cache date older than 3 months.
  • 14% of the urls had a cache date that was between 2-3 months ago.
  • 23% of the urls had a cache date that was between 1-2 months ago.
  • 63% of the urls had a cache date that was less than a month ago.

Now, we can’t make any set-in-stone predictions based on this data as it is from a relatively small sample size that looked at the backlinks of only 5 sites.  But, what I can see is that while the majority of my unnatural links that are in the Google cache get revisited (and subsequently disavowed) within a month, a good percentage of links are going to take 2-3 months to get recrawled and disavowed.  What this means is that if you file a disavow today and Penguin refreshes in two weeks from now, there’s a good chance that Penguin is still going to be affecting your site and that you will need to wait until the next refresh in order for your disavow to be fully recognized.

What about links that are NOT in the Google cache?

When I first ran my disavowed links through Scrapebox’s Google Cache checker I saw a lot of results that looked like this:

There were a large number of links for which Scrapebox reported “No Cache”.  At first I thought that this was a problem with Scrapebox or perhaps with the proxies I was using, but after manually checking a number of these I realized that they actually were not in the Google cache.  In fact, a good number of the bad links that were pointing to most of these sites are no longer in the Google index.  Now, of course that doesn’t surprise me as Google works hard to deindex spammy sites.  But, it got me thinking.

What happens to links in my disavow file that are no longer crawled and indexed by Google?  What if I have disavowed a link and Google never revisits it again?  Can it be disavowed? Or is it always going to remain as a bad link to my site?

Do these deindexed/no longer cached pages just drop out of a site’s backlink profile?

I figured that perhaps most of these deindexed pages would just drop out of my link graph so I did another test.  What I did was take one of my clients and determine how many of the links that I had disavowed were no longer in their Webmaster Tools list of links.  I did the following:

  • I made a list of all of the links that were in my disavow and that Scrapebox was reporting “no cache” for.
  • I pulled out the Google Webmaster Tools (GWT) links that I had downloaded for this client back in October of 2013.
  • Next, I used a VLOOKUP to determine how many of my disavowed and no longer cached links were in the list of links from GWT back in October of 2013.
  • This showed me that there were links from 363 domains that were originally in our GWT list but are no longer in Google’s cache.
  • I then downloaded the most recent set of links from GWT for this client and did another VLOOKUP to see how many of my disavowed, no longer cached links were in our current list of backlinks.

My assumption was that the majority of the links that came from sites that had been deindexed and no longer existed in Google’s cache would have just dropped out of the link profile and would no longer be counting towards my client’s site.

However, it turns out that out of 363 linking domains, 206 of them still remained as live links according to GWT!  57% of the links were still remaining according to Google’s list of backlinks in Webmaster Tools!  Have these links been disavowed?  If there are no pages linking to these spammy linking pages, how will Google ever revisit them to apply my disavow directive?

Added later: I asked John Mueller about this and he told me that pages that are not in the Google cache are not used for algorithmic calculations.

Can you force Google to revisit an old page?

Many people have asked me if it is possible to speed up the disavow process.  John Mueller was asked in a hangout a while back if you could use the submit to Google feature to get Google to revisit a url but John said that that wouldn’t work.  Another possible solution is to actually build links to the pages that you want Google to revisit.  The idea is that as Google spidered your site with these links on them then they would end up spidering the spammy pages as they follow the links.  To do this you’d likely want to set up the links on sites that you don’t care about and that aren’t connected with your own sites.  There is no specific penalty that I am aware of that Google would give for this type of action but my gut tells me that it’s something that might be frowned upon by the webspam team.  Still…I’m in the process of doing some experiments.  I’ll let you know if they work out.

Conclusions

My hope is that Google has accounted for this in their disavow calculations, but who knows.  John Mueller was asked in a hangout in March of 2014 whether we should disavow links from deindexed pages and he said that we should.  The reasons he gave were that these pages could pop back in the index and also that sometimes deindexed pages can still pass PageRank.  This means that these deindexed and no longer cached pages can still be passing bad link signals to your site and I think that it is possible that they will never get recrawled and never get disavowed!

Who knows…perhaps this is one of the reasons why Penguin has not refreshed in so long.  Perhaps Google has realized that too many sites that have done good thorough cleanups are not going to escape Penguin because Google can’t recognize that they have disavowed so many sites that are no longer in the Google index. Maybe they’re still trying to find ways to recognize ALL of a site’s disavowed links.

What do you think?