Indexing office documents and Flash files

Yandex indexes HTML documents and files of the following types: PDF, DOC/DOCX, XLS/XLSX, PPT/PPTX (MS Office); ODS, ODP, ODT, and ODG (Open Office); RTF, TXT, and SWF (if a file is referenced directly or embedded in HTML code using object or embed). If an SWF file contains useful content, the original HTML document can be found by the content indexed in the SWF file.

When new software versions are released, support for the new formats may take a while.

Restrictions on the indexed data:

  • Documents larger than 10 MB aren't indexed.
  • If a PDF document contains only images, the first three pages are indexed. A PDF document that also contains text is indexed in full.

  • In Flash documents, the text from the following blocks is indexed:

    • DefineText.

    • DefineText2.

    • DefineEditText.

    • Metadata.

  • Links are indexed if they are in these blocks:

    • DoAction.

    • DefineButton.

    • DefineButton2.

Tell us what your question is about so we can direct you to the right specialist:

Pages with different content can be considered duplicates if they responded to the robot with an error message (for example, in case of a stub page on the site). Check how the pages respond now. If pages return different content, send them for re-indexing — this way they can get back in the search results faster.

To prevent pages from being excluded from the search if the site is temporarily unavailable, configure the 503 HTTP response code.

Excluding pages from the search results is not an error on the part of a site or the indexing robot: it excludes pages that users won't be able to find using search queries. Therefore, their exclusion shouldn't affect the visibility of indexed pages on the site. To learn more, see Low-value or low-demand pages.

Contact support if:

  • Pages were ranked high in the search results before they were excluded.
  • The site's position after the exclusion of pages decreased dramatically.
  • The number of click-throughs from the search engine reduced significantly after the pages were excluded.