How to Avoid the Duplicate Content Issue for Google?

Duplicate content on a page has much in common with a budget overrun. Only in this case, we are talking about the decrease of the “Trust Budget” in the search robot’s site. This issue concerns a lot of site owners, as it can appear even behind their back. Having read something about the duplicate content issue, your site turned into a time bomb. Tick Tock on the clock and Google sanctions are waiting for you.

Sad but true, there is about 25-30% of duplicate content on the Web. Ex-Googler Matt Cutts holds this view. Although duplicate content can get you in a world of trouble with search engine optimization, it’s not all that dramatic. There is a solution — read this article and get to know how to avoid duplicate content issues for Google.

What is duplicate content?

There are 3 main types of duplicate content.
  • Exact duplicate: Two URLs have completely identical content;
  • Content with slight differences: Such as the sentence order, a bit of different images, etc;
  • Cross-domain duplicates: An exact or slightly changed copy exists in many domains.

Moreover, two related concepts exist that Google doesn`t consider as duplicate content. But not so experienced publishers and SEO specialists can easily mix them up with duplicate content.

  • Thin content: These are pages with very little content. Set of pages built on a list of organization addresses, which have 6 000 addresses, but each page contains only one address: just a few lines.
  • Slice content: Pages that differ slightly from each other. The site sells Timberland shoes that come in sizes 38, 38.5, 39, 40, 41, 42, etc. If the site has a separate page for each shoe size, there will be a minor difference between all those pages.   Such an effect Google perceives as slice content.

Google dislikes equally thin and sliced content. Any of these effects can be detected by Google Panda. That is why publishers should avoid creating these types of pages.

Duplicate content can happen for a score of reasons:
  • licensing of your site’s content;
  • defects in site architecture due to a content management system not optimized for search engines;
  • existence of plagiarism.

Over the last five years, spammers, with an extraordinary content need, have started “ripping off” content from legitimate sources, transposing words by using a variety of complex processes and placing the resulting text on their pages to attract “long tail” search operations and show the contextual advertising and other dishonest aims. So, nowadays people live in a world of “duplicate content issues” and “duplicate content penalties”.

Recommended for you: 5 Ways Good Content Can Speed Up Your Marketing Efforts.

Facts about duplicate content

Duplicate content location

If all the certain content is on your site, is it duplicate content?

Yes, because duplicate content can happen both on the same and on different sites.

Duplicate content percentage

What percentage of a page should be duplicated to fall under a duplicate content filter? Unfortunately, search engines never make this information public because it would affect their ability to prevent the problem itself. This percentage is constantly changing for all engines. The bottom line is that pages do not have to be identical to be considered duplicates.

The code to text ratio

What if your code is very large, but there are a few unique HTML elements on the page? Won’t Google think that all the pages are duplicates of each other?

No. Search engines don’t care about your code, but about the content of your pages. Code size only becomes a problem when it grows out of proportion.

The navigational elements to unique content ratio

All the pages on your site have a large navigation bar, lots of headers, and footers, but very little content. Won’t Google consider all these pages to be duplicated?

No. Google considers navigation elements before it even evaluates pages for duplication.

Licensed content

You want to avoid the duplicate content issue. But what to do if you have content from other web sources you licensed to show to your visitors?

Use meta name = “robots” content=”noindex, follow”. Put it in the header of your page, and the search engines will know that this content is not for them. Another option is to get exclusive rights to own and publish that content.

What kind of content is there?

  • Unique content is written by a person. It is completely different from any other combination of letters, symbols, and words on the web and has not been affected by computer text processing algorithms.
  • Fragments are small pieces of content (e.g., quotes) that are copied and used over and over again. They rarely pose a problem for search engines, especially when included in a larger document with a lot of unique content.
  • Shingles. Search engines look for relatively small segments of phrases (five to six words) on other web pages. If two documents have too many shingles, the search engines may interpret those documents as duplicate content.

What is the CODE?

programming code web development

There are many ways to create duplicate content. This explains why there is more than enough of it on the web. Internal duplicate content needs specific tactics to get the best results in terms of optimization. Frankly speaking, duplicate pages are pages of no value to both users and search engines. Then try to avoid this problem completely. Make sure that only one URL refers to each page. Furthermore, do a 301 redirect for the old URLs to the remaining URLs. It helps the search robots to see the changes you’ve made as quickly as possible and keep the “link juice” that the deleted pages had.

If this is not possible, there are a lot of other options. Here’s a rundown of the easiest solutions for various scenarios:

  • You can use robots.txt file to block search engine spiders from crawling through duplicate versions of your site pages;
  • use rel=”canonical” element, which is the second-best solution to remove duplicate pages;
  • use CODE <meta name=”robots” content=”noindex”> to instruct the SEARCH engines not to show duplicate pages.

However, notice: if you use robots.txt to prevent page view, applying noindex or nofollow on the page makes no sense. As a spider cannot read the page, it will never see noindex or nofollow meta-tags. With these tools in mind, consider some specific situations of duplicate content.

You may like: Why Content Writing Matters for Brands and Businesses?

HTTPS pages

If you’re using the SSL protocol (encrypted data exchange between the browser and the web server, which is often used for e-commerce), then your site has pages that start with HTTPS: (instead of HTTP:). The problem turns up when links on your HTTPS pages point at other pages on the site using relative rather than absolute links. For instance, the link to your home page becomes instead of

If your site has this problem, you can use rel=”canonical” or 301 redirects to fix it. An alternative solution is to change the links to absolute: instead of /contenthtml), which also makes life a bit more difficult for those who steal your content.

Content management systems creating duplicate content

Sometimes a site can have lots of versions of identical pages. It happens to limitations in some content management systems that refer to the same content with over one URL. It is usually a completely extra duplication that is of no value for users. The best decision is to remove the duplicate pages and do a 301 redirect for the removed pages to the remaining. If it doesn’t work, try other methods.

Pages for printing or multiple sorting options

A lot of sites offer pages for printing that give the user the same content in a printer-adapted format. Some e-commerce sites provide lists of their products with multiple sortings (by size, color, brand, and price). These pages are of value for the user but are of no value for the search engines. Therefore, they think that it`s duplicate content. In such a situation, you can create a CSS spreadsheet for printing.

Duplicate content in blogs and archiving systems

Blogs have the form of an interesting variant on the duplicate content issue. A blog post can appear on several different pages:

  • the start page of the blog;
  • the permalink page for that post;
  • the archive pages;
  • the category pages.

Each post copy is a duplicate of the other copies. Very rarely do publishers try to deal with the problem of a post’s presence on both the blog home page and the permalinks page. And the search engines seem to cope with this problem quite well. However, it might make sense to show only post snippets on category and archive pages.

User-generated duplicate content (repeated posts, etc.)

A variety of sites use structures to get user-generated content, such as blogs, forums, or message boards. These can be great ways to develop a lot of content at a very low cost. The problem is that a user can publish the same content on both your site and several other sites at the same time, which leads to duplicate content issues. It is difficult to control, but to reduce the problem you can proceed as follows:

  • You need to have a clear policy that notifies users that the content they provide to your site should be unique and cannot be posted on other sites. Without question, it is difficult to get this, but it will help to realize your expectations;
  • Brush up your forum in a unique way that will need different content. In addition to the standard data entry fields, also add some unique fields (different from other sites) that will be useful for your site visitors.

You may also like: How to Amplify Your SEO with Video Content?



Don’t worry too much about duplicate content. It’s usually not such a big deal. Mostly Google itself knows how to deal with issues such as master pages or content citations. Besides, lots of people confront the problem of duplicate content. Sometimes duplicates appear to no one where they expect them to, so you should always check the site for them. To prevent them, you need to create unique content for each page.

Author-Image-Isabelle-JordanThis article is written by Isabelle Jordan. Isabelle is a business and marketing journalist at insurance company. She writes for different news portals and thematic blogs that help her stay at the heart of the travel and insurance news. Such work gives her the opportunity to write articles on the most relevant topics of today.

Disclosure: Some of our articles may contain affiliate links; this means each time you make a purchase, we get a small commission. However, the input we produce is reliable; we always handpick and review all information before publishing it on our website. We can ensure you will always get genuine as well as valuable knowledge and resources.
Share the Love

Related Articles

Published By: Souvik Banerjee

Souvik BanerjeeWeb Developer & SEO Specialist with 15+ years of experience in Open Source Web Development specialized in Joomla & WordPress development. He is also the moderator of this blog "RS Web Solutions".