I've just finished reading a pdf I discovered via Barry Schwartz at Search Engine Round Table. The article was briefly discussing a leaked copy of the Google Quality Guidelines. It is a fascinating pdf to read. It was originally found by PotPieGirl. According to Barry, the document is copyrighted and it likely won't be available online for long. But, you may still be able to catch it here.
In case you aren't able to find the article, or if you simply don't want to read all 125 pages, I thought I'd summarize some of the paragraphs on webspam that I found to be interesting. I'm not sure why these guidelines were written, but see the end of my article for a slight conspiracy theory of mine.
What is spam?
The guide states that anything that is intended to trick search engines and draw in users is webspam. Having a page that is junky looking isn't enough to have it called spam. There has to be some deception present. Spam pages generally have very little content that is useful to users.
How to detect spam:
The guideline stated that someone checking for spam could use Ctrl-A or Command-A to highlight any hidden text. It really isn't a surprise though to know that a page with hidden text is considered spam. With that being said, though, be careful in how you program your pages. I wonder if perhaps there are instances where you can have hidden text without malicious intent, but still be dinged for being a potential spammer?
The reader is also instructed to look at the source code to try to detect unusual amounts of keyword stuffing. I found this quote fascinating:
URLs may also contain keyword stuffing. These URLs are computer-generated based on the words in the query and are often formatted with many hyphens (dashes) in them. They are a strong spam signal.
I have a site that currently seems to be taking a Panda hit. It has thousands of pages with good content. However, the urls are long strings of hyphenated words, many of which are page keywords. My intent wasn't to spam, but I wonder if the Google algorithm has considered this keyword stuffing?
If a page redirects to another with spam intent it's considered sneaky. If you have urls that redirect through several pages then this can be a spam signal. If a page redirects to a well known page like Amazon it can be considered sneaky. I'm assuming that this is because the site may place an affiliate cookie along with the redirect?
Copied content was identified as spam.
A page that contains only RSS feeds and PPC ads was considered spam. A page that contains content copied from Wikipedia or DMOZ and is surrounded by ads is spam. A page that looks like a search page but only contains PPC ads is considered spam. However, if you have lyrics, quotes, or poetry that is duplicated elsewhere on the web you're likely not going to be considered a spammer unless that content is surrounded by ads.
Conclusion of the article:
The article concluded by saying that if a page looked like it existed mostly to make money then it was likely spam. If you removed all of the PPC and duplicate content and there wasn't much left, then the page is likely spam.
I think that the "leaking" of this article is a brilliant sneaky plan by Google. What Google wants is for webmasters to produce the best content possible. After reading this article, I see that I have some things on some of my sites that could possibly be picked up by a Google algorithm as spammy. I plan to be making some changes ASAP. And as such, we both win. Google has found a way to convince me to change, and I have a site that ends up getting more visitors.
I'd love to hear your thoughts on this leaked article!
Google update newsletter
Want an update when Google makes a big algorithm change or other announcement? Sign up here!