Explore our Search Engine Optimization (SEO) Toolkit

Understanding Google’s Penguin update and what to do about it

Author's avatar By Expert commentator 17 Jul, 2012
Essential Essential topic

A review of Google's key site quality-related algorithm changes in 2012

You will have heard that the Google update known as ‘Penguin’ was rolled out earlier this year; the 24th April 2012 to be precise. It came at a time when Google had rolled out a number of other updates, some more significant than others.

Given the negative effect on ranking for some sites there has naturally been an awful lot written about the Penguin update, however I can’t help but feel that there has been some confusion over what Penguin addressed and how to tell if you were a victim or not.

In this post I will discuss just what Penguin is likely to have affected, the other updates around this time (that some webmasters could have confused with Penguin), and what to do to recover if you have been affected by these updates.

logo google picture


The main 2012 quality-related updates

1. 23rd March – Panda Refresh Bundle

There were a couple of significant updates in March and April that came to pass. The Panda (3.4) refresh update was rolled out around March 23rd and coincided with a large number of messages being sent to webmasters worldwide. This was a webmaster tools (WMT) notification that indicated there was unnatural linking to your website. We can only speculate however as to whether these messages were as a result of Panda or if they were as a result of other updates that coincided with the Panda refresh. Using the Inside Search post about changes in March, we can look at all the changes to see which ones could potentially be linked to this WMT notification:

The two changes below could potentially be algorithm changes that resulted in the notifications (and subsequent hits) to sites with what were deemed to be unnatural links:

  • Site Quality
  • Anchor Text

These are the important disclosures about these changes made by Google in Inside search:

"Improvements to processing for detection of site quality. [launch codename "Curlup"]. We've made some improvements to a longstanding system; we have to detect site quality. This improvement allows us to get greater confidence in our classifications".

"Better interpretation and use of anchor text. We've improved systems we use to interpret and use anchor text, and determine how relevant a given anchor might be for a given query and website".

The point ‘Improvements to processing for detection of site quality’ is likely to be responsible for the downgrading of link networks and such like. The Panda refresh focused around site quality; as details of this and another related update rolled out in March suggest:

"High-quality sites algorithm data update and freshness improvements. [launch codename "mm", project codename "Panda"] Like many of the changes we make, aspects of our high-quality sites algorithm depend on processing that's done offline and pushed on a periodic cycle.  In the past month, we've pushed updated data for "Panda" as we mentioned in a recent tweet.  We've also made improvements to keep our database fresher overall".

Anchor Text

The second point ‘Better interpretation and use of anchor text’ could be the much improved detection of unnatural anchor text patterns or anchor text spamming. This particular update meant that many webmasters saw negative movements on a handful of their key terms, ones which inevitably had too much keyword rich anchor text.

Getting keyword relevant anchor text links has become a minefield for webmasters, and perhaps should now be used very sparingly, if at all. Over the last few months Google has been actively making changes to how it handles anchor text.

February updates

Link evaluation. We often use characteristics of links to help us figure out the topic of a linked page. We have changed the way in which we evaluate links; in particular we are turning off a method of link analysis that we used for several years. We often rearchitect or turn off parts of our scoring in order to keep our system maintainable, clean and understandable.

March updates

Here are Google's alerts to webmasters in March:

"Tweaks to handling of anchor text. [launch codename "PC"] This month we turned off a classifier related to anchor text (the visible text appearing in links). Our experimental data suggested that other methods of anchor processing had greater success, so turning off this component made our scoring cleaner and more robust".

Better interpretation and use of anchor text. We've improved systems we use to interpret and use anchor text, and determine how relevant a given anchor might be for a given query and website.

April updates

And for April...

"Anchors bug fix. [launch codename "Organochloride", project codename "Anchors"] This change fixed a bug related to our handling of anchors".

The activity around anchor text suggests that now Google relies more on the content / subject of a page or website as a relevance classifier, than it does on the anchor text of links. Google still looks at anchor text as a way of determining spammy and unnatural linking strategies, as we’ve seen with some of the sites hurt in the recent updates.

 2. 24th April – Penguin Update

What is Penguin?

Firstly we need to define, as best we can, what the penguin update addressed:

Penguin was a group of updates focused on combating ‘webspam’. The official line from Google in the notification was “a decrease in rankings for sites that we believe are violating Google existing quality guidelines”. The key point here is Google’s mention of their ‘quality guidelines’.

Google haven’t actually changed anything in the rulebook, they have just got better at enforcing the rules.

Let’s have a look at Google’s existing quality guidelines for webmasters and try to identify which ones they may have just got better at detecting with Penguin:

Don’t have hidden text

  • While a site wouldn’t have got away with having white text on white background pre-Penguin, there were many instances where sites and pages ranked well despite using CSS to hide portions of SEO’d content.
  • Post-Penguin Google is likely to have improved its detection of content for SEO that has been hidden using CSS.

Don’t cloak

  • You wouldn’t have got away with serving different content or URL’s to users and search engines (for long at least) prior to Penguin and it’s unlikely Google’s detection of this has changed.

Don’t keyword stuff

  • This is probably the most obvious thing to pinpoint that has changed with the Penguin update. Google has openly suggested it’s detection of keyword stuffing has improved. Matt Cutts pointed it out in his announcement of the update and Google mentioned it specifically in their list of monthly updates.
  •  Matt Cutts alluded to a very blatant example in his Penguin launch post but we can now safely say that you need to pay extra care to this, making sure you don’t spam alt text and title attributes as well as avoiding the more obvious keyword stuffing techniques.

"Keyword stuffing classifer improvement. [project codename "Spam"] We have classifiers designed to detect a when a website is keyword stuffing. This change made the keyword stuffing classifier better".

Don’t have duplicate content

  • Panda addressed this issue last year and its unlikely that Penguin will have concentrated on this point.

 Don’t have doorway pages

  • This technique was previously detectable but Google may have improved its methods of detecting less blatant examples.

Don’t participate in link schemes

This is a hard one, and where much of the confusion has arisen. Due to all the activity regarding site authority in March’s updates list, I would be inclined to say that the Penguin update is less about the authority of your links and more likely to be focused on detecting anchor text spam.

It appears that the site authority updates could be attributed to the ‘unnatural link warnings’ sent via webmaster tools (i.e. part of Panda 3.4, not Penguin).

In the preliminary announcement of the Penguin update, it was referred to as the over-optimisation update – which again seems more in line with detecting unnatural densities of anchor text and ‘over-SEO’ed websites’.

Another important part of the Penguin update that addressed this linking guideline was externally linking out to potential webspam or ‘bad neighbourhoods’. Therefore now, as mentioned in the Matt Cutts post, Google is also capable of detecting if links are out of context with the content of page or site.

One other element of over-optimisation could be ‘sitewide links’. The presence of too many sitewide links from other sites can indicate aggressive link building such as paid links, badges, wordpress themes and/or plugin links.


Now that you can hopefully differentiate the common problems associated with these updates, the next steps are to:

  • Discover which problems are most likely to be affecting your website and its rankings
  • Make the necessary changes to make your site compliant with Google’s quality guidelines
  • Follow the necessary procedures to get re-included in the SERP’s

All of these points are addressed in the second part of this post, which can be read on the Search Laboratory blog.

Jimmy McCann Thanks to Jimmy McCann for sharing his advice and opinions in this post. Jimmy is Head of SEO at Search Laboratory in Leeds. You can follow him on Twitter or connect on LinkedIn.

Author's avatar

By Expert commentator

This is a post we've invited from a digital marketing specialist who has agreed to share their expertise, opinions and case studies. Their details are given at the end of the article.

Recommended Blog Posts