Robots.txt: Control How Search Engines Crawl Your Site

Learn how to use robots.txt to manage search engine crawling behavior.

Robots.txt: Control How Search Engines Crawl Your Site

What is Robots.txt?

Robots.txt is a text file at your website's root that tells search engine crawlers which pages they can or cannot access.

How It Works

Crawlers check /robots.txt before crawling:

  • Bot requests yoursite.com/robots.txt
  • Reads allow/disallow rules
  • Follows (or ignores) based on configuration
  • Basic Syntax

    Allow All

    ``

    User-agent: *

    Allow: /

    `

    Block All

    `

    User-agent: *

    Disallow: /

    `

    Block Specific Directories

    `

    User-agent: *

    Disallow: /admin/

    Disallow: /private/

    Disallow: /temp/

    `

    Specific Bot Rules

    `

    User-agent: Googlebot

    Allow: /

    User-agent: GPTBot

    Disallow: /

    `

    Common Use Cases

    Block Admin Areas

    `

    Disallow: /wp-admin/

    Disallow: /admin/

    `

    Block Search Results

    `

    Disallow: /search

    Disallow: /*?s=

    `

    Block AI Crawlers

    `

    User-agent: GPTBot

    User-agent: CCBot

    User-agent: anthropic-ai

    Disallow: /

    `

    Important Notes

    Robots.txt Is Public

    Anyone can view your robots.txt—don't use it to hide sensitive URLs.

    Not a Security Measure

    Robots.txt is a suggestion, not enforcement. Use authentication for true protection.

    Include Sitemap Reference

    `

    Sitemap: https://example.com/sitemap.xml

    ``

    Check your robots.txt with our free validation tool.