The Clean-param directive

Use the Clean-param directive if the site page URLs contain GET parameters (such as session IDs, user IDs) or tags (such as UTM) that don't affect their contents.

The Yandex robot uses this directive to avoid reloading duplicate information. This improves the robot's efficiently and reduces the server load.

For example, your site contains the following pages:

www.example.com/some_dir/get_book.pl?ref=site_1&book_id=123
www.example.com/some_dir/get_book.pl?ref=site_2&book_id=123
www.example.com/some_dir/get_book.pl?ref=site_3&book_id=123

The ref parameter is only used to track which resource the request was sent from. It doesn't change the page content. All three URLs will display the same page with the book_id=123 book. Then, if you indicate the directive in the following way:

User-agent: Yandex
Disallow:
Clean-param: ref /some_dir/get_book.pl

the Yandex robot will converge all the page addresses into one:

www.example.com/some_dir/get_book.pl?book_id=123

If such page is available on the site, it is included in the search results.

Directive syntax

Clean-param: p0[&p1&p2&..&pn] [path]

In the first field, list the parameters that should be disregarded by the robot, separated by the & character. In the second field, indicate the path prefix for the pages the rule should apply to.

Note. The Clean-Param directive is intersectional, so it can be indicated anywhere within the robots.txt file. If several directives are specified, all of them will be taken into account by the robot.

The prefix can contain a regular expression in the format similar to the one used in the robots.txt file, but with some restrictions: you can only use the characters A-Za-z0-9.-/*_. However, the * character treated the same way as in the robots.txt file: the * character is always implicitly appended to the end of the prefix. For example:

Clean-param: s /forum/showthread.php

means that the s parameter is disregarded for all URLs that begin with /forum/showthread.php. The second field is optional, and in this case the rule will apply to all pages on the site.

It is case sensitive. The maximum length of the rule is 500 characters. For example:

Clean-param: abc /forum/showthread.php
Clean-param: sid&sort /forum/*.php
Clean-param: someTrash&otherTrash

Additional examples

#for URLs like:
www.example1.com/forum/showthread.php?s=681498b9648949605&t=8243
www.example1.com/forum/showthread.php?s=1e71c4427317a117a&t=8243

#robots.txt will contain:
User-agent: Yandex
Disallow:
Clean-param: s /forum/showthread.php
#URLs like:
www.example2.com/index.php?page=1&sid=2564126ebdec301c607e5df
www.example2.com/index.php?page=1&sid=974017dcd170d6c4a5d76ae

#robots.txt will contain:
User-agent: Yandex
Disallow:
Clean-param: sid /index.php
#if there are several such parameters:
www.example1.com/forum_old/showthread.php?s=681498605&t=8243&ref=1311
www.example1.com/forum_new/showthread.php?s=1e71c417a&t=8243&ref=9896

#robots.txt will contain:
User-agent: Yandex
Disallow:
Clean-param: s&ref /forum*/showthread.php
#if the parameter is used in multiple scripts:
www.example1.com/forum/showthread.php?s=681498b9648949605&t=8243
www.example1.com/forum/index.php?s=1e71c4427317a117a&t=8243

# robots.txt will contain:
User-agent: Yandex
Disallow:
Clean-param: s /forum/index.php
Clean-param: s /forum/showthread.php