The Clean-param directive
Use the Clean-param directive if the site page URLs contain GET parameters (such as session IDs, user IDs) or tags (such as UTM) that don't affect their contents.
The Yandex robot uses this directive to avoid reloading duplicate information. This improves the robot's efficiently and reduces the server load.
For example, your site contains the following pages:
www.example.com/some_dir/get_book.pl?ref=site_1&book_id=123 www.example.com/some_dir/get_book.pl?ref=site_2&book_id=123 www.example.com/some_dir/get_book.pl?ref=site_3&book_id=123
The ref parameter is only used to track which resource the request was sent from. It doesn't change the page content. All three URLs will display the same page with the book_id=123 book. Then, if you indicate the directive in the following way:
User-agent: Yandex Disallow: Clean-param: ref /some_dir/get_book.pl
the Yandex robot will converge all the page addresses into one:
If such page is available on the site, it is included in the search results.
Clean-param: p0[&p1&p2&..&pn] [path]
In the first field, list the parameters that should be disregarded by the robot, separated by the & character. In the second field, indicate the path prefix for the pages the rule should apply to.
The prefix can contain a regular expression in the format similar to the one used in the robots.txt file, but with some restrictions: you can only use the characters A-Za-z0-9.-/*_. However, the * character treated the same way as in the robots.txt file: the * character is always implicitly appended to the end of the prefix. For example:
Clean-param: s /forum/showthread.php
means that the s parameter is disregarded for all URLs that begin with /forum/showthread.php. The second field is optional, and in this case the rule will apply to all pages on the site.
It is case sensitive. The maximum length of the rule is 500 characters. For example:
Clean-param: abc /forum/showthread.php Clean-param: sid&sort /forum/*.php Clean-param: someTrash&otherTrash
#for addresses like:www.example1.com/forum/showthread.php?s=681498b9648949605&t=8243 www.example1.com/forum/showthread.php?s=1e71c4427317a117a&t=8243 #robots.txt will contain the following: User-agent: Yandex Disallow: Clean-param: s /forum/showthread.php
#for addresses like:www.example2.com/index.php?page=1&sid=2564126ebdec301c607e5df www.example2.com/index.php?page=1&sid=974017dcd170d6c4a5d76ae #robots.txt will contain the following: User-agent: Yandex Disallow: Clean-param: sid /index.php
#if there are several of these parameters:www.example1.com/forum_old/showthread.php?s=681498605&t=8243&ref=1311 www.example1.com/forum_new/showthread.php?s=1e71c417a&t=8243&ref=9896 #robots.txt will contain the following: User-agent: Yandex Disallow: Clean-param: s&ref /forum*/showthread.php
#if the parameter is used in multiple scripts:www.example1.com/forum/showthread.php?s=681498b9648949605&t=8243 www.example1.com/forum/index.php?s=1e71c4427317a117a&t=8243 #robots.txt will contain the following: User-agent: Yandex Disallow: Clean-param: s /forum/index.php Clean-param: s /forum/showthread.php