There is so much confusion about how the disavow tool works. Does it only start working when there is either a reconsideration request filed or an update of the Penguin algorithm, or does it really start working right away? In this article I want to put forward a theory I have that could explain some of the unusual things that people have seen when using this tool. A lot of my discussion centers around Cyrus Shepard's interesting experiment where he disavowed all of his site's links.
Warning: This post contains a lot of theory. I debated on whether or not to publish it because it is confusing. Yet, my hope is that it will generate some good discussion and help us to understand more about the Penguin algorithm.
How the disavow tool is supposed to work.
If you're reading this article you probably have a fairly good idea of how the disavow tool is supposed to work. The tool allows you to upload a text file containing either urls or entire domains that you would like Google to ignore in regards to calculating PageRank to your site. According to the official documentation for the disavow tool, what should be happening is that immediately following the uploading of a text file, the next time a link from that file is crawled the disavow will be applied. When Google disavows a link, essentially what they do is add an invisible nofollow tag to that link. If you have disavowed on the domain level, then whenever Google crawls that domain, if they find any links pointing to your site, they add an invisible nofollow tag.
Does the disavow tool only work when there is a major change like a reconsideration request or a Penguin update?
There are many people who have extensive experience with disavowing links who believe that the disavow only starts working when a major change happens such as the filing of a reconsideration request or a Penguin algorithm update. Tim Grice and Cyrus Shepard had this discussion on Twitter:
@CyrusShepard looks like you need to activate the file through a recon or wait for an update
— Tim Grice (@Tim_Grice) May 30, 2013
But, this theory contradicts what Google says. In the disavow tool documentation, they say:
...this information will be incorporated into our index as we recrawl the web and reprocess the pages that we see
To me that sounds like it happens fairly immediately.
Also, Google employee John Mueller has mentioned several times things that imply that the disavow tool starts working right away. Here is John in a Webmaster Central Hangout on March 15, 2013. Start watching at 31:09:
Site owner: You said that the disavow file is not being crawled until someone submits a reconsideration request. Is that correct?
John: No. As soon as you submit it, we use that when we recrawl the pages.
Here's another. Start watching at 8:27:
Site owner: Does the disavow tool only work after it is switched on manually by somebody?
John: That's not the case. It essentially runs automatically and granularly as we reprocess those urls. It's not something where someone has to manually click a button.
This question was asked about Cyrus' experiment. Start watching at 13:44:
Site owner: Does Google really use data from the disavow tool only by major updates?
John: No. Essentially this is something that's always used on an ongoing basis. So, if you add links or domains to your disavow file and you upload that then the next time we crawl those urls we'll essentially treat those links kind of like a nofollow link. So, it's not something that is only run periodically. It's essentially a part of our normal websearch systems that run all the time.
But why am I seeing things like this:
Google says that the disavow starts working immediately and not at the time of a major update. If this is the case, then why am I seeing things like this -Here is a site that disavowed a number of links on May 14. Nothing happened until May 22 which is the date of the Penguin 2.0 update:
Here is another site that did a fairly large disavow on May 3, and once again you can see that nothing really changed in site impressions until Penguin updated on May 22:
It really looks like the disavow tool accomplished nothing until Penguin hit.
To understand my theory of why it appears that the disavow tool appears to start working when there is a major change like a Penguin update, first, we need to understand Penguin. I wrote that and started laughing, because really, no one understands Penguin. But, let's start by explaining some things that we think we know about Penguin.
Trying to understand Penguin:
When Google introduced the Penguin algorithm update on April 24, 2012, what happened was that a lot of sites that had created large numbers of self made backlinks suddenly started to lose their rankings. Each time the algorithm updates or refreshes there are more reports of sites dropping out of their high ranking positions.
Some people believe that all that Penguin does is devalue the links that are perceived as unnatural. Others believe that it actually penalizes a site that has too many unnatural links. Although we don't have a direct answer from Google on this, here are some comments from John Mueller that are relevant:
Site owner: Does Penguin just dampen/reduce the power/effect of unnatural links, rendering them useless, thus causing one's rankings to drop? Or, does it also add an additional penalty drop to this existing reduced rank?
John: Essentially the two are kind of similar in that when our Penguin algorithm recognizes webspam that it has to react to we try not to show that site in the search results so frequently. So, essentially, why you are not showing up so high in search results, if that's because we are demoting the site in general it doesn't really matter that much to the webmaster. It's more the case that you just see that the site is kind of dropping in ranking.
Site owner: Is Penguin a filter that is placed on a site that makes it difficult for that site to rank well and then can that filter be lifted when Penguin refreshes, provided that the site owner has done the work to clean the backlink profile?
(In other words - Is Penguin a switch that is turned off or on to say whether or not a site is penalized?)
John: [Penguin] isn't something that is either turned on or turned off. It's really something that is more of a granular algorithmic change. Sometimes it may be that a site is lightly affected by algorithms like this and sometimes very strongly. If you work on it you can slowly move that bar forwards with the Penguin algorithm. That's similar to all of our algorithms in that they're not something that is either on or off but that they're trying to be very granular based on what it finds.
And one more:
Site owner: When a site is suffering under the Penguin algorithm, are new good links to that site treated with suspicion (or perhaps pass less value) until the bad links are cleaned up? Is this why you said it's like having an anchor that is pulling you down?
John: We don't specifically treat [links] in a bad way, but essentially, but if our algorithms determine that your web [...cuts out...] problematic then we're looking at your whole website and kind of treating it as problematic. It's not tied to specific links like that. It's more something that is being done on a website basis there. So, what I said there about having an anchor that is pulling you do is essentially that you're trying to move forward with your website but you still have the handbrake on.
Are you still with me? I am rambling a little with this post and the reason is that part of why I am writing it is to get my ideas on "paper" to help me try to understand Penguin a little better. Don't worry...we're getting to my theory on what happened with Cyrus' site after he used the disavow tool. I'm just taking the long road in getting there!
Here is what I have concluded after listening to these three hangout answers:
Penguin looks at the overall quality of a site's backlinks and can cause the overall ranking of that site to be affected depending on how unnatural the backlink profile is perceived to be.
How does Penguin determine the overall quality of a site's backlinks?
Obviously no one outside of Google knows the exact answer to this. It is probably a very complicated process. There are some people who believe it is something as simple as looking at the percentage of links with exact match keyword anchor text. Some believe that if you have xx% brand anchored links and xx% url anchored links that you will avoid Penguin. Personally, I think it is much more complicated. I have a theory that Google places links in one of three categories:
- Links that are almost certainly natural
- Links that are almost certainly unnatural
- Links that are somewhere in the middle
I also think that the vast majority of most sites' links fall into class #3 - somewhere in the middle. As Penguin evolves, Google will likely get better at classifying these links but for now, in most cases the algorithm simply doesn't know.
Why is this important to our discussion? How does this relate to what happened with Cyrus' site? Remember, Cyrus disavowed all of his links and absolutely nothing happened. But, when Penguin hit, BOOM. It appeared that that's when the disavow tool kicked in.
It's possible that you can't disavow a natural link!
Here's where we get to the exciting part of my theory. I think it's possible that Google doesn't allow you to disavow a link that they believe is natural. When Google first announced the disavow tool, here is a quote that makes me think that it is possible that some links can't be disavowed:
Q: If I disavow links, what exactly does that do? Does Google definitely ignore them?
A: This tool allows you to indicate to Google which links you would like to disavow, and Google will typically ignore those links. Much like with rel=”canonical”, this is a strong suggestion rather than a directive—Google reserves the right to trust our own judgment for corner cases, for example—but we will typically use that indication from you when we assess links.
If I add a rel canonical tag to my site and I mistype my domain name in my coding, Google may recognize that I have made a mistake and just decide not to pay any attention to that tag. Similarly, there are cases where I may decide to disavow a link (or domain) and Google ignores that decision. It's a "strong suggestion rather than a directive". I believe that when we try to disavow a link that Google has not flagged as unnatural then Google ignores that suggestion.
Going back to Cyrus' experiment where he disavowed all of the links to his site, if Google viewed these links as natural then it's possible that they ignored his suggestion to disavow those links. This is why nothing happened immediately after he filed the disavow file.
Perhaps if we have told Google we want to disavow a link, it is just one more piece in the puzzle in helping Google determine whether a link is unnatural?
Here's where this article starts to get confusing. I'll be impressed if you can follow along with me. 🙂
We don't know how Google decides that a link is unnatural (and therefore, amenable to being disavowed.) The algorithm may look at things like this:
- whether or not the site on which that link is hosted is deindexed or being penalized for link selling.
- whether there are a large number of other links on that site that are suspicious.
- whether that site has a large number of links that are pointing to sites that are in really competitive niches like casino, porn or payday loans sites.
- whether the link contains exact match keyword anchor text.
- whether the url of the page linking out contains certain keywords such as "links", "seo", "bookmark", etc.
- whether the links from this site are ever actually clicked on by real users.
There are probably MANY factors that are weighted when deciding how to classify a link. It's probably a super complicated process. But I wonder if one of the factors is:
- whether or not the site owner has elected to disavow this link.
If Google is on the fence about whether or not a link is natural and then the site owner decides to disavow it then it may be that this is the final straw that pushes the link over into the "most likely unnatural" category.
This decision making process likely happens during a Penguin update:
It would be a very time consuming process for Google to granularly assess each link on the web and make a decision of whether or not it is natural. If my theory is right, then I believe that this decision is made at the time of a Penguin update. When a Penguin update happens, then I believe that Google re-evaluates how much of a site's backlink profile is unnatural. In Cyrus' case, the first time that Penguin ran, the majority of his links were probably in the category of "most certainly natural" or "somewhere in the middle". But, when he told Google that he wanted to disavow all of his links, that decision to disavow may have pushed many links into the "most certainly unnatural" category, and as such, they are now allowed to be disavowed. Thus, when Penguin refreshes, Cyrus now has a pile of links that were formerly untouchable by the disavow tool because they were seen as natural, but are now able to be disavowed.
Will Cyrus recover?
Cyrus has now removed his disavow file. This means that as links get crawled again, they now start counting towards his site's PageRank. However, he did not see any improvement in his rankings at all since removing his disavow file. *If* he was not affected by Penguin, then, as links in his disavow file get recrawled, he really should see a gradual improvement in his site's rankings. The Penguin algorithm must be thinking that his links are untrustworthy.
Now, here's the part where my brain gets fuzzy. Penguin should not be taking into account links that are disavowed. In the eyes of Penguin, Google should be seeing that Cyrus has zero links. What this means is that as he gets new links, he should see an improvement in rankings. If there was absolutely no improvement then there are only two possible reasons for this:
1. Cyrus did not obtain enough new links to make a difference.
2. Penguin is distrusting the site despite the fact that links were disavowed and is making it so that it is very difficult to rank.
Looking at Cyrus' new links gained on ahrefs, I can't see that #1 is possible.
I have no explanation for what is happening here. In fact, this troubling fact has kept me from publishing this post for a while. I believe that Cyrus' site dropped coincidental with Penguin 2.0 because the disavow tool was now seeing his links as now available for being disavowed. But, I can't understand why the Penguin algorithm would be affecting his site. If his bad links are disavowed, they should not be counting towards Penguin. In fact, here is what John Mueller says (13:48):
John: In regards to algorithms that are looking at those links, obviously cleaning up those links is a good thing because we don't have to take a look at them, but if they're in the disavow file and we've recrawled them then obviously they won't be used for that algorithm.
Site owner: Do those two things equate though? Removing the link or adding it to the disavow file...or is there any difference?
John: That's pretty much the same with regards to an algorithm....Essentially if you can't have a link removed, then putting it in the disavow file is pretty much equivalent.
If Cyrus has disavowed all of his links, he should not be currently affected by the Penguin algorithm. It should be only the disavowing of links that has affected him which means that the new links that Cyrus has obtained in the last few months since Penguin refreshed really should cause him to see some improvement. If he hasn't seen any improvement then it means that he is somehow still under the effects of Penguin. When Penguin runs again, that directive to disavow the links is now gone and most likely, the links will go back into the "most certainly natural" category. I predict that the next time Penguin runs Cyrus will see an improvement. He probably will be doing even better than before he applied the disavow file because he has since gained even more natural links.
If you didn't read this article I don't blame you. It's confusing and I even debated whether to publish it. Here are the main points:
- Cyrus disavowed every single link to his site.
- Nothing happened until May 22, 2013 when Penguin 2.0 updated and he saw a dramatic drop in rankings.
- It was postulated that the disavow tool did not take effect until Penguin refreshed, but Google says that is not true.
- It is possible that the disavow tool will not allow you to disavow a natural link.
- It is possible that Google uses several criteria to determine whether a link is natural. If a link is in a debatable area, then us telling Google we want to disavow it could push it into the realm of "most certainly unnatural".
- When Penguin refreshed, the algorithm now saw that most of Cyrus' links were unnatural.
- As they were disavowed they should not have counted towards Penguin. However, the drop may have been because the links were now allowed to be disavowed because Google was no longer completely certain that they were natural links.
- I think that the drop in rankings was completely due to the disavow file. But, I can't explain why new links have not had any effect on his rankings. This makes me think that for some reason Penguin is causing Google to distrust his links.
- I predict that now that Cyrus has removed his disavow file, the next time Penguin refreshes he will see a complete recovery.
This has been a long, rambling post. If you made it to the end then kudos to you! Please note that this is just a theory. I am stupidly obsessed with trying to understand Penguin. There are many days where it completely consumes my thoughts. Yet, I still feel that I have only scratched the surface in understanding it. I wrote this post to try to reconcile the differences between what Google was saying in regards to the disavow tool and what people in the SEO world were seeing in real life.
There are still a number of questions left unanswered after I have written this article. I have explained why it appears that the disavow tool kicks in after Penguin refreshes, but not why some site owners are only seeing a change once a reconsideration request is filed. (I have not experienced this happening myself so I can't comment here.) I am also still really confused about how Penguin can affect a site where every link is disavowed if disavowing means that those links are not factored into the algorithm.
I am still really cautious in my use of the disavow tool. I use it for getting manual unnatural links penalties removed.
At this point though I still do not condone using the tool as a method to recover from Penguin unless you have a site with which you are willing to experiment. Edited on May 20, 2014: I have changed my mind on this statement now. If you have bad links, then yes, you should be disavowing them. The only sites that I have seen recover from Penguin are ones that have done a very thorough disavow.
I'd love to hear your thoughts below.
Added October 18, 2013: Penguin refreshed on October 4, 2013 and today Cyrus is reporting that he has not seen any improvement at all. There are a few possible reasons for this. It's possible that he just needs more time for Google to recrawl his previously disavowed links and start to attribute value to them again. It's also possible that there were other issues with the site that caused Penguin to hit it. I have not seen this, but some people have said that Cyrus' site previously had some pharma links pointed at it. That would certainly complicate the issue.