Added March 2013: See the end of this post for two updates. John Mueller stated that he may not have spoken clearly about Penguin.
I feel that I need to make an apology, and no it's not just because I am Canadian. I have been telling people for months now that their website could not recover until Penguin has a refresh. We've been waiting for that refresh since October 5...or have we?
I was shocked to hear John Mueller's words during a webmaster hangout. Listen to what he says at the 4:11 mark!
Here is a transcription of the relevant parts about Penguin. It starts with someone asking John about a site that had been affected by Penguin:
Hangout member: John, can I ask a quick question on that?
John Mueller: Sure.
Hangout member: Obviously that's not my site, but just, you mentioned there that you can see that they did some work to clean stuff up, but you mentioned that Penguin is obviously still seeing things that it doesn't like. Can I just...can you clarify there, they said that they submitted a disavow file. Could it be that some of the links haven't been disavowed because they haven't been crawled, or is that something separate?
John Mueller: Uh, that could be possible, yeah. So, with the disavow file we have to recrawl those individual URLs before we can drop those links, so that's something that can take weeks or even months to kind of get processed. Uh, they said that they started working on this in November so that would be a couple of months already so I think a lot of those that they submitted should have been recrawled in the meantime. But it's possible that there are some that are still out there that are problematic but more likely after such a long time they...maybe they just, like, missed some significant portion of those links so, maybe they were doing things like article submissions or directory submissions to low quality directories that they thought "Oh, these are kind of natural-looking links", that they just kind of didn't disavow or didn't have removed.
Hangout member: OK, now, Penguin is still running on the refresh so they still have to wait for that anyway, even if there is still Penguin artifacts.
[message type="custom"]John Mueller: Yeah, that's something that we re-run that regularly. It's not quite weekly or daily, but uh, usually it's enough, especially when we have to recrawl all of these links anyway to kind of see their results over time.
Penguin is re-run regularly?
What? I was shocked to hear John say that Penguin was re-run regularly. So, we have been telling everyone that October 5 was the last time Penguin refreshed, but is this true?
I have a few thoughts on what John meant by this:
1. It's possible that Penguin is refreshing very often both penalizing sites and allowing sites to recover. If this is the case, why are we not seeing recoveries being reported? It may be that not enough sites are being supported by good links, so that once all of the bad links are disavowed, then there is nothing to cause the site to recover and rank well. Penguin essentially disavows all of your bad links, so when the penalty is lifted, if all you have are bad links then you are still not going to see an increase in rankings.
2. Perhaps what John meant when he said, "Yeah, that's something that we re-run regularly" is that they regularly check a site to see if they have cleaned up their efforts, and if so, then they lift the Penguin "flag". I have always believed that Penguin was a flag on a site just like Panda. Perhaps it is possible that sites can only be demoted by Penguin on the large announced updates, but that you can recover on one of these unannounced updates if your site warrants recovery.
Today, on Google Plus, John Mueller made the following comment:
I was probably a bit too fast there :). While we do rerun link analysis regularly, the "Penguin" algorithm is in the stage we don't do regular pushes of all of the data in the same way that we update Panda once a month or so, for example. We're always working on improving that, of course.
I've asked John in the thread whether it is possible for a Penguin hit site to recover on a day other than a Penguin refresh. We'll see what he says.
John responded to my query on Google+ with the following:
+Marie Haynes theoretically, in an artificial situation where there's only one algorithm (which is, in practice, never the case), if a site is affected by a specific algorithm, then the data for that algorithm needs to be updated before it would see changes. In practice, while some elements might be very strong depending on what was done in the past, there are always a lot of factors involved, so significantly improving the site will result in noticeable changes over time, as we recrawl & reindex the site and it's dependencies, as well as reprocess the associated signals. So yes, you'd need to wait for the algorithm to update if it were the only thing involved, but in practice it's never the only thing involved so you're not limited to waiting.
Also keep in mind that for long-running processes (be it algorithm updates like this, or other higher-level elements in our algorithms), it's never a good idea to limit yourself to small, incremental improvements; waiting to see if "it's enough" can take a while, so I'd recommend working to take a very good look at the issues you've run across, and working to make very significant improvements that will be more than enough (which users will appreciate as well, so there's that win too :)).
So what does that mean? The way I interpret this is that he is saying that when Penguin affects a site, then the only way to recover from the specific issues that affected the site is to wait until Penguin refreshes again. However, at the same time, it's not impossible for a site to gain ground while it is waiting. In other words, if you've been affected by Penguin, then clean up what you can (i.e. disavow or remove the spammy links) and continue to improve your site. Those improvements may include improving on-site SEO, reducing keyword stuffing, adding new pages with value and getting good, natural links to your site.
I really hope Penguin refreshes soon so that we can get some data on recovery!