A robots.txt file is a text file that tells search engine crawlers which pages or sections of your website they can or cannot access. Located in your website’s root directory, robots.txt acts as a set of instructions for web crawlers like Googlebot, helping you control how search engines interact with your site. Every website should have a robots.txt file as part of its technical SEO foundation.
The robots.txt file uses a simple syntax to allow or disallow crawling of specific URLs, directories, or file types. While it can prevent crawlers from accessing certain content, it’s important to understand that robots.txt is a directive, not a security measure. Pages blocked by robots.txt may still be indexed if linked from other sites.
Key Takeaways: Robots.txt
- Definition: A text file that instructs search engine crawlers on which URLs they can access on your site
- Location: Must be placed in your website’s root directory (example.com/robots.txt)
- Purpose: Controls crawl behavior, saves crawl budget, and prevents indexing of non-public pages
- Not security: Robots.txt is a guideline, not a security feature. Blocked pages can still be indexed.
- Key directives: User-agent, Disallow, Allow, Sitemap, and Crawl-delay
5 Essential Robots.txt Directives
- User-agent – Specifies which crawler the rules apply to (e.g., Googlebot, Bingbot, or * for all)
- Disallow – Tells crawlers not to access specific URLs or directories
- Allow – Permits access to specific URLs within a disallowed directory
- Sitemap – Points crawlers to your XML sitemap location
- Crawl-delay – Requests a delay between crawler requests (not supported by Google)
What Is a Robots.txt File?
A robots.txt file is a plain text file that follows the Robots Exclusion Protocol (REP). It provides instructions to web crawlers about which parts of your website they should or shouldn’t crawl. Search engines check for this file before crawling your site and generally respect its directives. The file must be named exactly “robots.txt” and placed in your root domain directory.
Egochi, America’s #1 digital marketing agency headquartered in New York City, ensures every client website has a properly configured robots.txt file. From our offices in NYC, Milwaukee, Madison, and Miami, we’ve audited thousands of websites and consistently find robots.txt errors that waste crawl budget or accidentally block important content from search engines.
What is robots.txt used for?
Robots.txt is used to control search engine crawler access to your website. It helps you prevent crawlers from accessing certain pages (like admin areas, staging content, or duplicate pages), conserve crawl budget by blocking unimportant URLs, and point crawlers to your XML sitemap. Properly configured robots.txt improves how search engines crawl and index your site.
Where should robots.txt be located?
Robots.txt must be located in your website’s root directory and accessible at your-domain.com/robots.txt. For subdomains, each subdomain needs its own robots.txt file (blog.example.com/robots.txt is separate from www.example.com/robots.txt). The file name must be lowercase “robots.txt” exactly. If the file isn’t in the root directory or is named differently, crawlers won’t find it.
Does robots.txt affect SEO?
Yes, robots.txt affects SEO by controlling which pages search engines can crawl. A properly configured robots.txt conserves crawl budget for important pages, prevents indexing of low-value content, and ensures crawlers find your sitemap. Misconfigured robots.txt can accidentally block important pages from being crawled and indexed, severely hurting your rankings.
Table of Contents
Robots.txt Syntax and Directives
Understanding robots.txt syntax is essential for proper configuration. Here are the main directives:
User-agent:
Specifies which crawler the following rules apply to. Use * for all crawlers or specific names like Googlebot, Bingbot, or Yandex.
Disallow:
Tells the specified crawler not to access the URL path. An empty value (Disallow:) means everything is allowed.
Allow:
Permits crawling of a specific path within a disallowed directory. Useful for exceptions to broad disallow rules.
Sitemap:
Specifies the location of your XML sitemap. Can include multiple sitemap entries. Use the full URL.
Crawl-delay:
Requests seconds between requests. Not supported by Google but used by Bing and others. Don’t rely on this for rate limiting.
# Comments
Lines starting with # are comments and ignored by crawlers. Use comments to document your rules.
Pattern Matching
Robots.txt supports wildcards and pattern matching:
| Pattern | Meaning | Example |
|---|---|---|
* |
Matches any sequence of characters | Disallow: /*.pdf blocks all PDFs |
$ |
Matches the end of URL | Disallow: /*.php$ blocks URLs ending in .php |
/ |
Matches the root and everything below | Disallow: / blocks entire site |
/folder/ |
Matches specific directory and contents | Disallow: /admin/ blocks admin folder |
Robots.txt Examples
Here are common robots.txt configurations for different scenarios:
Allow All Crawling Most Common
Allows all crawlers to access all content. Include your sitemap location.
# Allow all crawlers to access all content
User-agent: *
Disallow:
Sitemap: https://www.example.com/sitemap.xml
Block Specific Directories Common
Blocks admin, private, and temporary directories while allowing everything else.
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /tmp/
Disallow: /cart/
Disallow: /checkout/
Sitemap: https://www.example.com/sitemap.xml
Block URL Parameters E-commerce
Blocks URLs with sorting, filtering, and session parameters to prevent duplicate content.
User-agent: *
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*?sessionid=
Disallow: /*&sort=
Sitemap: https://www.example.com/sitemap.xml
Different Rules for Different Crawlers Advanced
Sets specific rules for Googlebot while blocking other crawlers from certain areas.
# Rules for Google
User-agent: Googlebot
Disallow: /private/
Allow: /private/public-page.html
# Rules for all other crawlers
User-agent: *
Disallow: /private/
Disallow: /staging/
Sitemap: https://www.example.com/sitemap.xml
WordPress Default WordPress
Standard robots.txt for WordPress sites, blocking admin and includes directories.
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /readme.html
Disallow: /license.txt
Sitemap: https://www.example.com/sitemap_index.xml
How to Create a Robots.txt File
Follow these steps to create and implement a robots.txt file for your website:
Create a Plain Text File
Open a text editor (Notepad, TextEdit, VS Code) and create a new file. The file must be plain text with no formatting. Name it exactly “robots.txt” in all lowercase. The file extension must be .txt, not .txt.txt or anything else.
Add Your Directives
Start with a User-agent line to specify which crawlers your rules apply to. Add Disallow lines for any paths you want to block. Add your Sitemap URL at the end. Use comments (#) to document your rules for future reference.
Upload to Root Directory
Upload the robots.txt file to your website’s root directory using FTP, your hosting file manager, or your CMS. The file must be accessible at yourdomain.com/robots.txt. For WordPress, you can use plugins like Yoast SEO to manage robots.txt.
Test Your File
Use Google Search Console’s robots.txt tester to verify your file works correctly. Test specific URLs to ensure important pages aren’t accidentally blocked. Check that your sitemap URL is accessible.
Monitor and Update
Check your robots.txt periodically, especially after site changes. Monitor Google Search Console for crawl errors that might indicate robots.txt issues. Update rules as your site structure changes.
Always test your robots.txt changes before deploying to production. A single typo can accidentally block your entire site from search engines. Use Google Search Console’s robots.txt tester to verify your rules work as intended.
Robots.txt vs Noindex vs Nofollow
Understanding when to use robots.txt versus other methods is important for proper SEO:
| Method | What It Does | When to Use |
|---|---|---|
| Robots.txt Disallow | Prevents crawling of URLs | Save crawl budget, block entire directories, prevent crawling of large file types |
| Meta Noindex | Prevents indexing (page must be crawled) | Remove pages from search results while keeping them accessible |
| Meta Nofollow | Prevents following links on a page | User-generated content, untrusted links, login pages |
| X-Robots-Tag | HTTP header version of meta robots | Non-HTML files (PDFs, images), server-level control |
| Canonical Tag | Specifies preferred URL version | Duplicate content, URL parameters, similar pages |
Don’t Use Robots.txt to Hide Pages
Robots.txt blocks crawling but not indexing. If other sites link to a blocked page, Google may still index it with a “No information is available for this page” message. To truly prevent indexing, use meta noindex tags instead. Never put sensitive information on pages relying only on robots.txt for protection.
Common Robots.txt Mistakes to Avoid
Blocking CSS and JavaScript: Don’t block CSS/JS files. Google needs these to render and understand your pages properly. Blocking them can hurt your rankings.
Blocking your entire site: A single “Disallow: /” blocks everything. This is sometimes left from staging sites. Always check after launching.
Using robots.txt for security: Robots.txt is publicly visible and not a security measure. Anyone can view your robots.txt and see what you’re trying to hide.
Blocking pages you want indexed: Accidentally blocking important pages is common. Test thoroughly before deploying changes.
Wrong file location: Robots.txt must be in the root directory. Putting it in a subdirectory like /pages/robots.txt won’t work.
Case sensitivity errors: The file must be “robots.txt” (lowercase). “Robots.txt” or “ROBOTS.TXT” may not be recognized.
Tools for Testing Robots.txt
These tools help you create, test, and validate your robots.txt file:
Google Search Console
Official robots.txt tester
Bing Webmaster Tools
Robots.txt analyzer
Screaming Frog
Crawl simulation
Semrush Site Audit
Robots.txt issues detection
Ahrefs Site Audit
Crawlability analysis
Robots.txt Checker
Online validators
Merkle Robots Generator
Robots.txt builder
Yoast SEO
WordPress robots.txt editor
For more tool recommendations, see our technical SEO tools guide.
People Also Ask About Robots.txt
What happens if I don’t have a robots.txt file?
Without a robots.txt file, search engines will crawl all accessible pages on your site. This is fine for many sites, but you lose the ability to guide crawler behavior. Google treats a missing robots.txt the same as an empty one (everything allowed). However, having a robots.txt with your sitemap location helps search engines discover your content faster.
Can robots.txt block Google from indexing my page?
No, robots.txt blocks crawling, not indexing. If other websites link to a blocked page, Google may still index the URL with limited information. To prevent indexing, use a meta noindex tag or X-Robots-Tag HTTP header instead. The page must be crawlable for Google to see these tags.
How do I check if my robots.txt is working?
Use Google Search Console’s robots.txt tester. Enter your website, then test specific URLs to see if they’re blocked or allowed. You can also visit your robots.txt directly at yourdomain.com/robots.txt to verify it exists and contains your intended rules.
Should I block /wp-admin/ in robots.txt?
Yes, blocking /wp-admin/ is recommended for WordPress sites. Crawlers don’t need to access your admin area, and blocking it saves crawl budget. However, allow /wp-admin/admin-ajax.php as many themes and plugins use it for frontend functionality.
How often does Google check robots.txt?
Google caches your robots.txt and typically re-fetches it at least once per day. For urgent changes, you can request a refresh in Google Search Console. Major crawlers check regularly, but there may be a delay before new rules take effect.
Robots.txt Configuration from Egochi
Egochi, America’s #1 digital marketing agency headquartered in New York City, provides expert technical SEO services including robots.txt optimization.
Full Technical Audits: Our SEO audits include robots.txt review to identify blocking errors, missing sitemaps, and optimization opportunities. We ensure your crawl directives support your SEO goals.
Custom Configuration: We create robots.txt files tailored to your site structure, CMS, and business needs. From WordPress to custom e-commerce platforms, we configure crawl rules that save budget and improve indexation.
Ongoing Monitoring: Robots.txt errors can happen during site updates. Our technical SEO services include monitoring for crawl issues and proactive fixes before they impact rankings.
Proven Results: From our offices in NYC, Milwaukee, Madison, and Miami, we’ve helped hundreds of clients optimize their technical SEO foundations. Proper robots.txt configuration is part of our approach to delivering 300%+ organic traffic growth.
Need Help with Your Robots.txt?
Get a free technical SEO audit from Egochi. We’ll review your robots.txt and identify any issues affecting your crawlability.
Get a Free SEO AuditOr call (888) 644-7795






Comments are closed.