Skip to content

What is robots.txt

robots.txt is a plain text file at the root of your website that provides instructions to web crawlers about which parts of your site they should and shouldn't access.

Why it matters

  • Control crawl budget — prevent search engines from wasting time on non-indexable pages
  • Protect sensitive areas — keep admin panels and staging environments out of search results
  • Declare sitemaps — the Sitemap: directive tells crawlers where to find your XML sitemap
  • Manage AI crawlers — increasingly used to control LLM training data access

Basic format

User-agent: *
Allow: /
Disallow: /admin/

Sitemap: https://example.com/sitemap.xml