One Standard Fits All: Robots Exclusion Protocol for Yahoo!, Google and Microsoft
Date : 2008 06 03 Category : Yahoo Yahoo! Search BlogOver the last couple of years, we've been collaborating with Google and Microsoft to make webmasters' efforts more effective across the major search engines. By bringing standards such as Sitemaps and improvements like auto-discovery and cross-host submission, webmasters can simplify their account management across the different search engines.
The Robots Exclusion Protocol (REP) lets content publishers specify which parts of their site they want public and which parts they want to keep private from robots, whether it's controlling the visibility of their content across their site (via robots.txt) or at the level of individual pages (via META tags). REP was introduced in the early 1990's and is the de facto standard. Its strengths lie in its flexibility to evolve in parallel with the web, its universal implementation across major search engines and all major robots and in the way it works for any publisher, no matter how large or small. We've heard that there is some confusion around the specific implementation of REP supported by each engine. Since we've never detailed the specifics of implementing the protocol, today we're releasing detailed documentation on how REP directives will be handled by the three major search providers.
Common REP Directives
The following are all the major REP features currently implemented by Google, Microsoft and Yahoo!. Each of these directives can be specified to be applicable for all crawlers or for specific crawlers by targeting them to specific user-agents, which is how any crawler identifies itself. Each of us also supports Reverse DNS based authentication of our crawler, and you can use this validate the identity of any crawlers claiming a particular user-agent.
2. HTML META Directives
These directives can either be placed in the HTML of a page or in the HTTP header for non-HTML content like PDF, video, etc. using an X-Robots-Tag. The X-Robots-Tag mechanism allows these directives to be available for all types of documents -- HTML or otherwise. If both forms of the tag, HTML META and X-Robots-Tag in the header are present, the most restrictive one applies.
DIRECTIVEIMPACTUSE CASE(s) NOINDEX META Tag Tells a crawler not to index a given page. Don't index the page. This allows pages that are crawled to be kept out of the index. NOFOLLOW META Tag Tells a crawler not to follow a link to other content on a given page. Prevent publicly writeable areas from being abused by spammers looking for link credit. By NOFOLLOW, you let the robot know that you are discounting all outgoing links from this page. NOSNIPPET META Tag Tells a crawler not to display snippets in the search results for a given page. Present no abstract for the page on search results. NOARCHIVE META Tag Tells a search engine not to show a "cached" link for a given page. Do not make a copy of the page available to users from the search engine cache. NOODP META Tag Tells a crawler not to use a title and snippet from the Open Directory Project for a given page. Do not use the ODP (Open Directory Project) title and abstract for this page in Search.Other REP Directives
Yahoo!-specific REP directives that are not supported by Microsoft and Google include:
Crawl-Delay: Allows a site to delay the frequency with which a crawler checks for new content NOYDIR META Tag: This is similar to the NOODP META Tag above but applies to the Yahoo! Directory, instead of the Open Directory Project Robots-nocontent Tag: Allows you to identify the main content of your page so that our crawler targets the right pages on your site for specific search queries by marking out non content parts of your page. We won't use the sections tagged as such for indexing the page or for the abstract in the search results.
Apart from these tools in the REP, Yahoo! Site Explorer also provides further ways to tell Yahoo! to Delete URLs, or Rewrite Dynamic URLs to remove spurious parameters. You can learn more about our crawler at the Slurp Help page.
We plan to continue coordinating with the leading search engines to ensure simplicity for webmasters, so stay tuned for more developments in the future. You can also find information on the Official Google Webmaster Central Blog or the Microsoft Live Search Webmaster Center Blog. And, feel free to share your thoughts with the webmaster community on the Site Explorer Suggestion Board.
Priyank Garg
Director, Product Management
Yahoo! Search