Robots.txt: What It Is and How to Use It for SEO

Q: What does 'Disallow: /' mean?

'Disallow: /' blocks the entire website from being crawled. The forward slash represents the root directory and everything below it. This should never be on a live site.

Q: Can robots.txt block Google from indexing my page?

No, robots.txt blocks crawling, not indexing. If other websites link to a blocked page, Google may still index the URL. To prevent indexing, use a meta noindex tag instead.

Q: How do I check if my robots.txt is working?

Use Google Search Console's robots.txt tester. Enter your website, then test specific URLs to see if they're blocked or allowed. You can also visit yourdomain.com/robots.txt directly.

Q: What is User-agent in robots.txt?

User-agent identifies which crawler the rules apply to. 'User-agent: *' means rules apply to all crawlers. You can specify individual crawlers like Googlebot or Bingbot for different rules.

A robots.txt file is a text file that tells search engine crawlers which pages or sections of your website they can or cannot access. Located in your website’s root directory, robots.txt acts as a set of instructions for web crawlers like Googlebot, helping you control how search engines interact with your site. Every website should have a robots.txt file as part of its technical SEO foundation.

The robots.txt file uses a simple syntax to allow or disallow crawling of specific URLs, directories, or file types. While it can prevent crawlers from accessing certain content, it’s important to understand that robots.txt is a directive, not a security measure. Pages blocked by robots.txt may still be indexed if linked from other sites.

Key Takeaways: Robots.txt

Definition: A text file that instructs search engine crawlers on which URLs they can access on your site
Location: Must be placed in your website’s root directory (example.com/robots.txt)
Purpose: Controls crawl behavior, saves crawl budget, and prevents indexing of non-public pages
Not security: Robots.txt is a guideline, not a security feature. Blocked pages can still be indexed.
Key directives: User-agent, Disallow, Allow, Sitemap, and Crawl-delay

5 Essential Robots.txt Directives

User-agent – Specifies which crawler the rules apply to (e.g., Googlebot, Bingbot, or * for all)
Disallow – Tells crawlers not to access specific URLs or directories
Allow – Permits access to specific URLs within a disallowed directory
Sitemap – Points crawlers to your XML sitemap location
Crawl-delay – Requests a delay between crawler requests (not supported by Google)

What Is a Robots.txt File?

A robots.txt file is a plain text file that follows the Robots Exclusion Protocol (REP). It provides instructions to web crawlers about which parts of your website they should or shouldn’t crawl. Search engines check for this file before crawling your site and generally respect its directives. The file must be named exactly “robots.txt” and placed in your root domain directory.

100% Sites Should Have Robots.txt

Root Directory Location Required

500KB Max File Size for Google

1994 Year Protocol Introduced

Egochi, America’s #1 digital marketing agency headquartered in New York City, ensures every client website has a properly configured robots.txt file. From our offices in NYC, Milwaukee, Madison, and Miami, we’ve audited thousands of websites and consistently find robots.txt errors that waste crawl budget or accidentally block important content from search engines.

What is robots.txt used for?

Robots.txt is used to control search engine crawler access to your website. It helps you prevent crawlers from accessing certain pages (like admin areas, staging content, or duplicate pages), conserve crawl budget by blocking unimportant URLs, and point crawlers to your XML sitemap. Properly configured robots.txt improves how search engines crawl and index your site.

Where should robots.txt be located?

Robots.txt must be located in your website’s root directory and accessible at your-domain.com/robots.txt. For subdomains, each subdomain needs its own robots.txt file (blog.example.com/robots.txt is separate from www.example.com/robots.txt). The file name must be lowercase “robots.txt” exactly. If the file isn’t in the root directory or is named differently, crawlers won’t find it.

Does robots.txt affect SEO?

Yes, robots.txt affects SEO by controlling which pages search engines can crawl. A properly configured robots.txt conserves crawl budget for important pages, prevents indexing of low-value content, and ensures crawlers find your sitemap. Misconfigured robots.txt can accidentally block important pages from being crawled and indexed, severely hurting your rankings.

Robots.txt Syntax and Directives
Robots.txt Examples
How to Create a Robots.txt File
Robots.txt vs Noindex vs Nofollow
Tools for Testing Robots.txt
Robots.txt Configuration from Egochi
Frequently Asked Questions

Robots.txt Syntax and Directives

Understanding robots.txt syntax is essential for proper configuration. Here are the main directives:

User-agent:

Specifies which crawler the following rules apply to. Use * for all crawlers or specific names like Googlebot, Bingbot, or Yandex.

Disallow:

Tells the specified crawler not to access the URL path. An empty value (Disallow:) means everything is allowed.

Allow:

Permits crawling of a specific path within a disallowed directory. Useful for exceptions to broad disallow rules.

Sitemap:

Specifies the location of your XML sitemap. Can include multiple sitemap entries. Use the full URL.

Crawl-delay:

Requests seconds between requests. Not supported by Google but used by Bing and others. Don’t rely on this for rate limiting.

# Comments

Lines starting with # are comments and ignored by crawlers. Use comments to document your rules.

Pattern Matching

Robots.txt supports wildcards and pattern matching:

Pattern	Meaning	Example
`*`	Matches any sequence of characters	`Disallow: /*.pdf` blocks all PDFs
`$`	Matches the end of URL	`Disallow: /*.php$` blocks URLs ending in .php
`/`	Matches the root and everything below	`Disallow: /` blocks entire site
`/folder/`	Matches specific directory and contents	`Disallow: /admin/` blocks admin folder

Robots.txt Examples

Here are common robots.txt configurations for different scenarios:

Allow All Crawling Most Common

Allows all crawlers to access all content. Include your sitemap location.

robots.txt

# Allow all crawlers to access all content
User-agent: *
Disallow:

Sitemap: https://www.example.com/sitemap.xml

Block Specific Directories Common

Blocks admin, private, and temporary directories while allowing everything else.

robots.txt

User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /tmp/
Disallow: /cart/
Disallow: /checkout/

Sitemap: https://www.example.com/sitemap.xml

Block URL Parameters E-commerce

Blocks URLs with sorting, filtering, and session parameters to prevent duplicate content.

robots.txt

User-agent: *
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*?sessionid=
Disallow: /*&sort=

Sitemap: https://www.example.com/sitemap.xml

Different Rules for Different Crawlers Advanced

Sets specific rules for Googlebot while blocking other crawlers from certain areas.

robots.txt

# Rules for Google
User-agent: Googlebot
Disallow: /private/
Allow: /private/public-page.html

# Rules for all other crawlers
User-agent: *
Disallow: /private/
Disallow: /staging/

Sitemap: https://www.example.com/sitemap.xml

WordPress Default WordPress

Standard robots.txt for WordPress sites, blocking admin and includes directories.

robots.txt

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /readme.html
Disallow: /license.txt

Sitemap: https://www.example.com/sitemap_index.xml

How to Create a Robots.txt File

Follow these steps to create and implement a robots.txt file for your website:

Create a Plain Text File

Open a text editor (Notepad, TextEdit, VS Code) and create a new file. The file must be plain text with no formatting. Name it exactly “robots.txt” in all lowercase. The file extension must be .txt, not .txt.txt or anything else.

Add Your Directives

Start with a User-agent line to specify which crawlers your rules apply to. Add Disallow lines for any paths you want to block. Add your Sitemap URL at the end. Use comments (#) to document your rules for future reference.

Upload to Root Directory

Upload the robots.txt file to your website’s root directory using FTP, your hosting file manager, or your CMS. The file must be accessible at yourdomain.com/robots.txt. For WordPress, you can use plugins like Yoast SEO to manage robots.txt.

Test Your File

Use Google Search Console’s robots.txt tester to verify your file works correctly. Test specific URLs to ensure important pages aren’t accidentally blocked. Check that your sitemap URL is accessible.

Monitor and Update

Check your robots.txt periodically, especially after site changes. Monitor Google Search Console for crawl errors that might indicate robots.txt issues. Update rules as your site structure changes.

Pro Tip

Always test your robots.txt changes before deploying to production. A single typo can accidentally block your entire site from search engines. Use Google Search Console’s robots.txt tester to verify your rules work as intended.

Robots.txt vs Noindex vs Nofollow

Understanding when to use robots.txt versus other methods is important for proper SEO:

Method	What It Does	When to Use
Robots.txt Disallow	Prevents crawling of URLs	Save crawl budget, block entire directories, prevent crawling of large file types
Meta Noindex	Prevents indexing (page must be crawled)	Remove pages from search results while keeping them accessible
Meta Nofollow	Prevents following links on a page	User-generated content, untrusted links, login pages
X-Robots-Tag	HTTP header version of meta robots	Non-HTML files (PDFs, images), server-level control
Canonical Tag	Specifies preferred URL version	Duplicate content, URL parameters, similar pages

Don’t Use Robots.txt to Hide Pages

Robots.txt blocks crawling but not indexing. If other sites link to a blocked page, Google may still index it with a “No information is available for this page” message. To truly prevent indexing, use meta noindex tags instead. Never put sensitive information on pages relying only on robots.txt for protection.

Common Robots.txt Mistakes to Avoid

Blocking CSS and JavaScript: Don’t block CSS/JS files. Google needs these to render and understand your pages properly. Blocking them can hurt your rankings.

Blocking your entire site: A single “Disallow: /” blocks everything. This is sometimes left from staging sites. Always check after launching.

Using robots.txt for security: Robots.txt is publicly visible and not a security measure. Anyone can view your robots.txt and see what you’re trying to hide.

Blocking pages you want indexed: Accidentally blocking important pages is common. Test thoroughly before deploying changes.

Wrong file location: Robots.txt must be in the root directory. Putting it in a subdirectory like /pages/robots.txt won’t work.

Case sensitivity errors: The file must be “robots.txt” (lowercase). “Robots.txt” or “ROBOTS.TXT” may not be recognized.

Tools for Testing Robots.txt

These tools help you create, test, and validate your robots.txt file:

Google Search Console

Official robots.txt tester

Bing Webmaster Tools

Robots.txt analyzer

Screaming Frog

Crawl simulation

Semrush Site Audit

Robots.txt issues detection

Ahrefs Site Audit

Crawlability analysis

Robots.txt Checker

Online validators

Merkle Robots Generator

Robots.txt builder

Yoast SEO

WordPress robots.txt editor

For more tool recommendations, see our technical SEO tools guide.

Robots.txt Configuration from Egochi

Egochi, America’s #1 digital marketing agency headquartered in New York City, provides expert technical SEO services including robots.txt optimization.

Full Technical Audits: Our SEO audits include robots.txt review to identify blocking errors, missing sitemaps, and optimization opportunities. We ensure your crawl directives support your SEO goals.

Custom Configuration: We create robots.txt files tailored to your site structure, CMS, and business needs. From WordPress to custom e-commerce platforms, we configure crawl rules that save budget and improve indexation.

Ongoing Monitoring: Robots.txt errors can happen during site updates. Our technical SEO services include monitoring for crawl issues and proactive fixes before they impact rankings.

Proven Results: From our offices in NYC, Milwaukee, Madison, and Miami, we’ve helped hundreds of clients optimize their technical SEO foundations. Proper robots.txt configuration is part of our approach to delivering 300%+ organic traffic growth.

Need Help with Your Robots.txt?

Get a free technical SEO audit from Egochi. We’ll review your robots.txt and identify any issues affecting your crawlability.

Get a Free SEO Audit

Or call (888) 644-7795

Frequently Asked Questions

What is robots.txt in simple terms?

Robots.txt is a text file that tells search engines which pages of your website they can or cannot visit. It’s like a set of instructions for web crawlers. You place it in your website’s main folder, and search engines check it before crawling your site.

Does every website need a robots.txt file?

While not strictly required, every website should have a robots.txt file. Even if you want everything crawled, including your sitemap location helps search engines find your content. For larger sites, robots.txt is essential for managing crawl budget and blocking non-essential pages.

How do I find my robots.txt file?

View your robots.txt by typing your domain followed by /robots.txt in your browser (e.g., yourdomain.com/robots.txt). If you see a 404 error, you don’t have a robots.txt file. To edit it, access your website’s root directory via FTP, hosting file manager, or your CMS settings.

What does “Disallow: /” mean?

“Disallow: /” blocks the entire website from being crawled. The forward slash represents the root directory and everything below it. This is sometimes used during development but should never be on a live site. Always check for this after launching a new website.

What is User-agent in robots.txt?

User-agent identifies which crawler the following rules apply to. “User-agent: *” means the rules apply to all crawlers. You can specify individual crawlers like “User-agent: Googlebot” for Google or “User-agent: Bingbot” for Bing to create different rules for each.

Can I block specific bots in robots.txt?

Yes, you can create rules for specific bots using their User-agent names. For example, to block a specific crawler, add “User-agent: BotName” followed by “Disallow: /”. However, malicious bots often ignore robots.txt. For true blocking, use server-side methods like .htaccess rules.

Should I block images in robots.txt?

Generally, no. Blocking images prevents them from appearing in Google Images and can affect how Google understands your pages. Only block images if you have specific reasons, like preventing crawling of very large image directories. For most sites, images should remain crawlable.

What is the Sitemap directive in robots.txt?

The Sitemap directive tells crawlers where to find your XML sitemap. Add “Sitemap: https://yourdomain.com/sitemap.xml” to your robots.txt. You can include multiple sitemap entries. This helps search engines discover all your pages efficiently, even if they’re not well-linked.

How long until robots.txt changes take effect?

Search engines cache robots.txt, so changes may take hours to days to take effect. Google typically re-checks within 24 hours. For urgent changes, use Google Search Console to request a robots.txt refresh. Even then, crawlers need time to process the new rules across your site.

Can robots.txt hurt my SEO?

Yes, misconfigured robots.txt can severely hurt SEO by accidentally blocking important pages, CSS/JS files, or even your entire site. Always test changes before deploying. Check Google Search Console regularly for crawl errors. A properly configured robots.txt helps SEO; a broken one can destroy it.