SEO: Avoiding Duplicate Content

4 techniques to reduce duplicate site content

Although we still hear a lot of recommendations on how to optimise pages and the importance of links from other sites to improve rankings, we don’t hear so many recommendations about reducing the amount of duplicate content. This seems to be a neglected area of  SEO advice, so this refresher highlights some of the techniques to avoid duplicate content.

What is duplicate content?

Duplicate content is where different pages on a site are interpreted as similar by Google. As a result, pages can be discarded or down-weighted in Google’s index so that they won’t rank. In its webmaster tools guidance, Google says:

Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin. Examples of non-malicious duplicate content could include: discussion forums that can generate both regular and stripped-down pages targeted at mobile devices; store items shown or linked via multiple distinct URLs and printer-only versions of web pages.

We would add that, more importantly for site owners, category, product or service pages can be interpreted as identical because the pages are too similar. You may be able to see this within the analytics if you have some product pages that aren’t attracting any natural search traffic.

To reduce this problem here are my recommendations for dealing with duplicate content that you could check with your agency / in-house team. Numbers 1, 3 and 4 are most relevant typically

1. Show Google pages have distinct content

This is straightforward, but you have to make sure the page editors creating the content are aware of how to do this. It’s normal on-page optimisation good practice. You should set unique titles, meta descriptions, headings and body copy for each page you want to rank. Each page should have a focus subject or main keyword you want to target. Brief your copywriter to stay focussed on that subject is key to ensuring your pages are unique.

To check that your pages don’t have problems with duplicate titles or descriptions you should check out the HTML improvements section in Google Webmaster Tools which will flag pages with problems of identical titles or desctiptions for you.

Here’s an example of where ASOS are doing a good job of keeping page titles and descriptions unique:

asos

While the products are very similar ASOS has kept page titles and descriptions different. To arrive at these results I searched specifically on Asos.com for “Ralph Lauren Rugby Top”.

2. Robots.txt – exclude crawlers from duplicate content

An established tool in the SEO toolkit, robots.txt is a powerful text file located in the root of your website. By using the robots.txt file, site owners can instruct specific search engines or all of them to ignore specific pieces of content. The example below would block any files within the print-versions directory from been indexed by Google.

User-agent: Googlebot

Disallow: /print-versions/

If you choose to you can utilise the Meta Tag to specify whether a particular page should be indexable to search engines, an example of what you would include in your header would be:

<meta name=”robots” content=”noindex,nofollow” >

You can read more about utilising the meta tag here if you need to.

3. 301 permanent redirects

301 redirects are a common way of making sure visitors are directed to the right page, for example, if a page is turned off because the campaign has finished you can redirect people to your homepage so they do not see an error page and bounce off. A 301 also tells search engines that this page has now permanently moved to a new location. This is of great use if you need to move from older versions or duplicate versions of webpages. For example, sometimes sites have similar, duplicate versions of the homepage. A 301 will ensure Google finds the new page and also passes on as much of that pages rank it feels relevant.

Another common use of a 301 redirect is to redirect non “www” traffic as per the example below:

http://example.com > http://www.example.com

4. Use rel=“canonical” or exclude parameters in Google Webmaster Tools

The canonical tag is becoming a widely used method for instructing search engines of duplicate pages & more specifically which URL is the primary. I wrote specifically about implementing canonical tags on ecommerce sites previously.

The tag should look like this in the HEAD section of your html page specifying the preferred version of duplicates.

<link rel=”canonical” href=”http://www.example.com/product-1.html” />

The tag alerts search engines that it is a duplicate page & also informs them where the primary page is located!

A more recent feature that Google has launched within the Webmaster Tools area is the ability to inform Google of URL parameters to exclude on your website where the page content is similar.  This will then let you determine whether Google should index the content or not, see example below:

Webmaster Tools

This solution is often particularly useful on Ecommerce sites where duplicate content can occurs when sites make the same content available via different URLs in a faceted search for selecting products or by using session IDs or affiliate parameters, like this:

http://www.example.com/products/women/dresses/green.htm

http://www.example.com/products/women?category=dresses&color=green


http://example.com/shop/index.php?product_id=32&highlight=green+dress&cat_id=1&sessionid=123&affid=431

In these cases you would specify “category”, “product_id”, “highlight”, “sessionid” and “affid” as parameters.

From this post you can see there are a quite a few way duplicate content can occur and a range of tools to help manage it. So next time you’re speaking to them, ask whoever is responsible for your SEO about the steps they take to minimise duplicate content.

  • http://www.fusepump.com/ Dina

    Thanks for great post. How would you go about duplicate content as a result of session IDs?

    • http://www.smartinsights.com/ Dave Chaffey

      Hi Dina, thanks for your comment.

      The easiest way to fix the problem will be to exclude the session id parameter in Google Webmaster Tools.
      But this still gives problems in other search engines, so if possible it’s best to exclude them in robots .txt e.g. Disallow: /*?session-id.
      Another option is to only create session ids when users are logged-in and then the bots won’t see them.

  • Andrew Carr

    Great post, though one would ask the question why any site chooses to run a separate print version of their content when print styling is more efficient!

    • http://www.smartinsights.com/ Dave Chaffey

      Hi Andrew, thanks.

      Yes, I think you’re right about print versions – that is rare now. I remember a few years ago ClickZ had a print version like this.

  • Mihaela

    How about when you sell products that are also listed on other sites (let’s say laptops) and the description and features of the product are the same and this leads to chunks of duplicate content?

  • Pingback: Avoiding duplicate content SEO | Nigel Carr - MD of Shoes.co.uk Shoes and Social Media Junkie!()

Feedback Form
Feedback Form