What is robots.txt?
robots.txt is a text file in the root directory of a website that controls search engine crawling behavior. It uses the Robots Exclusion Protocol and contains instructions such as: User-agent (which bot), Disallow (don't crawl), Allow (crawling allowed), and Sitemap (reference to sitemap). Important: robots.txt only prevents crawling, not indexing! Pages can still appear in search results if linked from other pages. For actually blocking indexing, you need noindex meta tags or X-Robots-Tag headers. Errors in robots.txt can have fatal SEO consequences.
Key Points
- Always located at domain.com/robots.txt
- Disallow prevents crawling, not indexing
- Wildcards (*) and $ for patterns possible
- Crawl-delay only respected by some bots
- Sitemap reference recommended
- Test with Google Search Console
Practical Example
“We blocked the admin area in robots.txt: Disallow: /admin/”