Indexing office documents and Flash files
Yandex indexes HTML documents and files of the following types: PDF, DOC/DOCX, XLS/XLSX, PPT/PPTX (MS Office); ODS, ODP, ODT, and ODG (Open Office); RTF, TXT, and SWF (if a file is referenced directly or embedded in HTML code using object or embed). If an SWF file contains useful content, the original HTML document can be found by the content indexed in the SWF file.
When new software versions are released, support for the new formats may take a while.
Restrictions on the indexed data:
- Documents larger than 10 MB aren't indexed.
If a PDF document contains only images, the first three pages are indexed. A PDF document that also contains text is indexed in full.
In Flash documents, the text from the following blocks is indexed:
Links are indexed if they are in these blocks: