What is robots.txt¶
robots.txt is a plain text file at the root of your website that provides instructions to web crawlers about which parts of your site they should and shouldn't access.
Why it matters¶
- Control crawl budget — prevent search engines from wasting time on non-indexable pages
- Protect sensitive areas — keep admin panels and staging environments out of search results
- Declare sitemaps — the
Sitemap:directive tells crawlers where to find your XML sitemap - Manage AI crawlers — increasingly used to control LLM training data access