Duplicate pages

If website pages are available at different addresses, but have the same content, the Yandex robot may consider them duplicates and merge them into a group of duplicates.

Note

Duplicates are pages within the same site. For example, pages on regional subdomains with the same content aren't considered duplicates.

If your website has duplicate pages:

  • The page you need may disappear from search results, if the robot has selected another page from the group of duplicates.

    Also, in some cases, pages may not be grouped, participating in the search as different documents. In this way, they compete with each other. This may impact the website's ranking in search results.

  • Depending on which page remains in the search, the address of the document may change. This can cause difficulties when viewing statistics in web analytics services.

  • It takes the indexing robot longer to crawl the website's pages, which means the data about pages that are important to you is sent to the search database more slowly. The robot can also create an additional load on your website.

How to determine if your website has duplicate pages

Duplicate pages appear for a variety of reasons:

  • Natural. For example, if a page with a product description is available in several categories of an online store.
  • Related to the specific features of the site or its CMS (for example, its printer-friendly version, UTM tags for tracking ads, etc.)

To find out which pages are excluded from the search because of duplication:

  1. In Yandex Webmaster, go to the Searchable pages section and select Excluded pages.
  2. Click the icon and select the “Deleted: Duplicate” status.

You can also download the archive. To do this, choose the file format at the bottom of the page. In the file, duplicate pages have the DUPLICATE status. Learn more about statuses

If the duplicates were created because GET parameters were added to the URL, a notification about this will appear on the Troubleshooting page in Yandex Webmaster.

Note

A duplicate page can be either a regular site page or a fast version of it, such as an AMP page.

How to get rid of duplicate pages

To keep a specific page in search results, you'll need to point Yandex's bot to it. You can do this in a few ways, depending on the page's URL type.

Example for a regular site:

http://example.com/page1/ and http://example.com/page2/

Example for a site with AMP pages:

http://example.com/page/ and http://example.com/AMP/page/

In this case:

  • Set up a 301 redirect from one duplicate page to another. In this case, the target of the redirect will be included in the search results.

  • Specify the preferred (canonical) URL for the page to be included in the search.

  • Add the Disallow directive to the robots.txt file to prevent the indexing of the duplicate page.

    If you cannot restrict such links in robots.txt, prevent their indexing using the noindex meta tag. Then the search bot will be able to exclude pages from the database as they are reindexed.

    You can also restrict AMP pages that duplicate content from other types of pages.

To determine which page should remain in search, focus on the convenience for your site’s visitors. For example, if it’s a section that contains a number of similar products, you can choose the root or catalog page as the search page. This way, the visitor will be able to view other pages from it. In case of regular HTML and AMP pages duplicating each other, we recommend keeping the regular HTMLs in search.

https://example.com and https://example.com/index.php

In this case:

We recommend setting up a redirect from internal pages to the home page. If you set up a redirect from https://example.com/ to https://example.com/index.php, the content of https://example.com/index.php will be displayed at https://example.com/ (according to redirect processing rules).

http://example.com/page/ and http://example.com/page

In this case, set up a 301 redirect from one duplicate page to another. Then the target of the established redirect will be included in the search.

We do not recommend using the rel=canonical attribute in this case, as it may be ignored. With a redirect, users will be directed immediately to the correct URL of the page.

If the problem is on the home page, no configuration is needed. The search engine recognizes http://example.com and http://example.com/ as identical pages.

Yandex indexes links with and without a trailing slash equally. When choosing the URL to keep in search, consider which address the pages are currently indexed under if a redirect has not yet been set. For example, if pages without a slash are already participating in the search, it is worth setting up a redirect from pages with a slash to links without a slash. This will help you avoid additional changes to the page addresses in search.

http://example.com/page////something/

In this case, the search engine removes duplicate characters. The page will be indexed at http://example.com/page/something/.

If the URL contains \ (for example, http://example.com/page/something/\\\\), the search engine treats such a page as separate. It will be indexed at http://example.com/page/something/\\\\.

In this case:

  • Set up an HTTP 301 redirect from one page to another. In this case, the target of the redirect will be included in the search results.

  • Specify the preferred (canonical) URL for the page to be included in the search.

  • Add the Disallow directive to the robots.txt file to prevent the page from being indexed.

    If you cannot restrict such links in robots.txt, prevent their indexing using the noindex meta tag. Then the search bot will be able to exclude pages from the database as they are reindexed.

If the differences are in the parameters that do not affect content, use recommendations. For example, such parameters can be UTM tags:

https://example.com/page**?utm_source=instagram&utm_medium=cpc

Note

Some parameters, including most of the UTM parameters, are automatically removed by the search engine. For more information, see Parameters that are removed automatically.

In this case, add the Clean-param directive to the robots.txt file so that the bot does not take the URL parameters into account. If Yandex Webmaster shows a notification about page duplication because of GET parameters, this method will fix the error. The notification will disappear when the robot learns about the changes.

Tip

The Clean-Param directive is intersectional, so it can be indicated anywhere within the file. If you define other directives specifically for the Yandex bot, list all rules intended for it in a single section. In this case, the User-agent: * string will be ignored.

Example of the Clean-param directive

#for addresses like:
https://example.com/page?utm_source=link&utm_medium=cpc&utm_campaign=new

#robots.txt will contain:
User-agent: Yandex
Clean-param: utm_source&utm_medium&utm_campaign /page
#thus we tell the bot that it should keep the https://example.com/page address in the search

#to apply the directive to parameters on pages at any address, do not specify the address:
User-agent: Yandex
Clean-param: utm_source&utm_medium&utm_campaign

If you cannot change robots.txt, specify the preferred (canonical) page address to be included in the search.

http://example.com/page/ and http://example.com/page?AMP

In this case, add the Clean-param directive to the robots.txt file so that the robot ignores the parameters in the URL.

If AMP pages are formed not by the GET parameter but by using the /AMP/ directory format, they can be viewed as regular content duplicates.

The robot learns about changes the next time it visits your site. As soon as that happens, the page that shouldn't be included in the search will be excluded from it within three weeks. If the site has many pages, this may take longer.

You can check if the changes have taken effect by going to Searchable pages in Yandex Webmaster.

Ungrouping

To improve search quality, the ungrouping mechanism may be applied. This happens if:

  • Pages that are the best at solving the user’s problem are located on the same domain, while other resources are less relevant. For example, when a query specifies a specific site. In this case, the search may show several links to different pages of this site.
  • The results include different subdomains of a large web portal, i.e., a site that combines several informational resources and services. Usually these are subdomains host sites that belong to different businesses and owners, and the content concerns diverse services. The search ranks each of the subdomains of the web portal in isolation from each other. For example, such ungrouping is applied to large ecosystem platforms and blog platforms.

The owner of a site that often appears at the top of the search results may use Yandex Webmaster to propose to reclassify their domain as a web portal. To do this, you have to provide a description of the services on the subdomains and their owners. After that, they can be ranked as independent sites that independently accumulate user signals. Changes in user signals can both positively and negatively affect the positions of subdomains in search results.

Contact support

If you followed the above recommendations but the changes didn't affect the search results after three weeks, fill out the form below. In the form, specify the sample pages.

Pages with different content can be considered duplicates if they responded to the robot with an error message (for example, in case of a stub page on the site). Check how the pages respond now. If pages return different content, send them for re-indexing — this way they can get back in the search results faster.

To prevent pages from being excluded from the search if the site is temporarily unavailable, configure the 503 HTTP response code.




You can also go to