The Clean-param directive
Use the Clean-param directive if the site page URLs contain GET parameters (such as session IDs, user IDs) or tags (such as UTM) that don't affect their contents.
Specify the Clean-param directive as fully as possible and keep it up-to-date. The new parameter that doesn't affect the page content may result in duplicate pages that should not be included in the search. Due to the large number of such pages, the robot crawls the site more slowly. This means it will take longer for important changes to show up in the search results.
The Yandex robot uses this directive to avoid reloading duplicate information. This improves the robot's efficiently and reduces the server load.
For example, your site contains the following pages:
www.example.com/some_dir/get_book.pl?ref=site_1&book_id=123
www.example.com/some_dir/get_book.pl?ref=site_2&book_id=123
www.example.com/some_dir/get_book.pl?ref=site_3&book_id=123
The ref parameter is only used to track which resource the request was sent from. It doesn't change the page content. All three URLs will display the same page with the book_id=123 book. Then, if you indicate the directive in the following way:
User-agent: Yandex
Disallow:
Clean-param: ref /some_dir/get_book.pl
the Yandex robot will converge all the page addresses into one:
www.example.com/some_dir/get_book.pl?book_id=123
If such page is available on the site, it is included in the search results.
Directive syntax
Clean-param: p0[&p1&p2&..&pn] [path]
In the first field, list the parameters that should be disregarded by the robot, separated by the & character. In the second field, indicate the path prefix for the pages the rule should apply to.
The prefix can contain a regular expression in the format similar to the one used in the robots.txt file, but with some restrictions: you can only use the characters A-Za-z0-9.-/*_. However, the * character treated the same way as in the robots.txt file: the * character is always implicitly appended to the end of the prefix. Examples:
Clean-param: s /forum/showthread.php
means that the s parameter is disregarded for all URLs that begin with /forum/showthread.php. The second field is optional, and in this case the rule will apply to all pages on the site.
It is case sensitive. The maximum length of the rule is 500 characters. Examples:
Clean-param: abc /forum/showthread.php
Clean-param: sid&sort /forum/*.php
Clean-param: someTrash&otherTrash
Additional examples
#for URLs like:
www.example1.com/forum/showthread.php?s=681498b9648949605&t=8243
www.example1.com/forum/showthread.php?s=1e71c4427317a117a&t=8243
#robots.txt will contain:
User-agent: Yandex
Disallow:
Clean-param: s /forum/showthread.php
#for URLs like:
www.example2.com/index.php?page=1&sid=2564126ebdec301c607e5df
www.example2.com/index.php?page=1&sid=974017dcd170d6c4a5d76ae
#robots.txt will contain:
User-agent: Yandex
Disallow:
Clean-param: sid /index.php
#if there are multiple parameters like this:
www.example1.com/forum_old/showthread.php?s=681498605&t=8243&ref=1311
www.example1.com/forum_new/showthread.php?s=1e71c417a&t=8243&ref=9896
#robots.txt will contain:
User-agent: Yandex
Disallow:
Clean-param: s&ref /forum*/showthread.php
#if the parameter is used in multiple scripts:
www.example1.com/forum/showthread.php?s=681498b9648949605&t=8243
www.example1.com/forum/index.php?s=1e71c4427317a117a&t=8243
#robots.txt will contain:
User-agent: Yandex
Disallow:
Clean-param: s /forum/index.php
Clean-param: s /forum/showthread.php