If you have been working diligently on cleaning up your backlink profile in order to recover from the Penguin algorithm, you may also want to have a thorough look at your on page factors as well.  John Mueller from Google has just said in a Webmaster Central hangout that Penguin is not just about links.

I asked John this question because he previously had hinted in a Webmaster Central hangout that the Penguin algorithm could take into account more than just bad links.  During the December 16th hangout, I asked the following question:

Does the Penguin algorithm only take into account links, or are there other factors that could be contributing?

You can hear John's answer (also transcribed below) at the 27:52 mark here:


I think, with the Penguin algorithm, when we rolled it out we did a blog post about that that kind of shows the other issues that we look at there.  The Penguin algorithm is a webspam algorithm and we try to take a variety of webspam issues into account and use them with this algorithm.  It does also take into account links from spammy sites or unnatural links in general, so that's something that we would take into account there, but I wouldn't only focus on links.  Also a lot of times what we see is that when a website has been spamming links for quite a bit maybe they're also doing some other things that are kind of borderline or even against our webmaster guidelines.  So I wouldn't only focus on links there.  I would make sure that you are cleaning up all of the webspam issues as completely as possible.

On Page Factors that Could Contribute to Penguin

The article that John mentioned that Google published when Penguin rolled out is here.  Here are some key quotes that are key to our discussion:

  • "a few sites use techniques that don’t benefit users, where the intent is to look for shortcuts or loopholes that would rank pages higher than they deserve to be ranked. We see all sorts of webspam techniques every day, from keyword stuffing to link schemes that attempt to propel sites higher in rankings."
  • "The change will decrease rankings for sites that we believe are violating Google’s existing quality guidelines."

The article shows the following example of keyword stuffing that could be affected, but also states that many sites that are affected by Penguin would not have as obvious an issue:

The article also references the quality guidelines.  The guidelines are fairly long, so I have summarized here the parts that I believe could contribute to a site being affected by Penguin because they are committing webspam infractions:

  • Keep the links on a given page to a reasonable number. We have recently seen Matt Cutts say that the "100 links per page" rule is not so much an issue now.  However, if a site has pages with hundreds of links that are obviously just there for search engines and not readers then this could be a potential Penguin factor.
  • Make pages primarily for users, not for search engines. This likely falls under the same category as keyword stuffing, but if you have page after page of articles that are written for search engines and not users then this could possibly be affecting you in the eyes of Penguin.
  • Don't deceive your users. 
  • Avoid tricks intended to improve search engine rankings. A good rule of thumb is whether you'd feel comfortable explaining what you've done to a website that competes with you, or to a Google employee. Another useful test is to ask, "Does this help my users? Would I do this if search engines didn't exist?"

Google then gives an informative list of things that could be considered webspam:

  • Automatically generated content
  • Cloaking
  • Sneaky redirects
  • Hidden text or links
  • Doorway pages
  • Scraped ContentParticipating in affiliate programs without adding sufficient value
  • Loading pages with irrelevant keywords
  • Creating pages with malicious behavior, such as phishing or installing viruses, trojans, or other badware
  • Abusing rich snippets markup
  • Sending automated queries to Google

They also recommend the following although I don't know for certain whether these could be Penguin factors:

  • Monitoring your site for hacking and removing hacked content as soon as it appears
  • Preventing and removing user-generated spam on your site

How important are these on page factors?

While this information that Penguin can take into account more than just links is important, it is still vitally important that bad links to your site are addressed.  Shortly after Penguin rolled out, Matt Cutts tweeted the following:

Certainly links are a primary area to monitor. Been true all this year; expect to continue.

In my opinion, links are still the most important factor when it comes to dealing with the Penguin algorithm.  However, if you have been affected by Penguin, I think that it is important to look for on page issues that may be considered attempts to manipulate the search engine results or deceive users.

More articles by Marie Haynes on Penguin:

I regularly tweet about Penguin and unnatural links issues.  You can follow me here.

What on page factors do you think are important in the eyes of Penguin?

I'd love to hear your thoughts.  Do you have examples of sites that needed to clean up on page factors before recovering from Penguin?  How important do you think on page issues are in regards to recover?  Please leave a comment below.