Robots.txt — Introduction, Specifications & How to Create a robot.txt File

What is a robots.txt file?

The robots.txt file also called “robots exclusion protocol”, is a set of instructions that tell search engine crawlers what type of content they are allowed and disallowed to request from your website. Robotx.txt is a public file, which anyone can access to find what sections of your website, you do not want web crawlers to see, and which web bots you have blocked from crawling your website.

It is recommended not to use robots.txt file to hide sections of your website, instead use the no-index attribute or enable password access to those specific pages of your website.

Robots.txt example:

User-agent: *
Disallow: /
  • User-agent: *: This section applies to all website crawlers.

  • Disallow: /: This tells crawlers which website file they should not request.

Why is Robots.txt important?

The robots.txt file tells search engine crawlers which URLs or files on your website can or cannot be requested by the search engines. It does this by telling the search engines what type of content should not be found on your site.

  • It increases the speed at which your site loads and reduces unnecessary bandwidth load on your server

  • It keeps a page out of search results if you want it kept out.

  • It helps prevent search engine bots from indexing sensitive content (such as your WordPress admin page)

  • It allows you to block abusive users, crawlers, and other security threats.

The above reasons are why Google encourages you to have one.

Where should you add your robots.txt file?

The robots.txt file needs to be placed in the top-level directory of your web server and should be available in your website’s root.

Your robots.txt file URL should look like this – “http://www.example.com/robots.txt”.

The most common error while creating a robots.txt file is the use or capital cases in the file name, the correct format should always be in lower case, such as /robots.txt and not /Robots.TXT.

How to create a robots.txt file?

Robots.txt file is simple text file thus you can use any text file notepad and start creating your file on it.

Let’s understand two main statements of the robots.txt file.

  1. User-agent: is where you specify the crawl bot’s name or add an * sign to mark all crawlers.
  2. Disallow: statement, by adding a / to it means to block and leaving it blank means to allow it.

Let’s look at multiple scenarios of creating a robots.txt file.

Creating robots.txt file to block all crawlers

To exclude all website crawlers from your servers you would add a / to your Disallow statement as shown below.

User-agent: *
Disallow: /

Creating robots.txt file to allow all crawlers to your web servers

If you want to provide complete access of your web servers to all website crawlers then you should leave the Disallow statement blank as shown below, you can also do this by simply not creating a robots.txt file and this will also allow all crawl bots to crawl your website.

User-agent: *
Disallow:

Creating robots.txt file to block all web crawlers from certain parts of your web server

To exclude all website crawlers from part of your website server, add the file name of the intended file to be blocked to the Disallow statement of the robots.txt file. Below you can see that we are blocking three web server files to accessed by website crawlers.

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/

Creating robots.txt file to block a single website crawler

By adding the name of the crawl bot to the User-agent statement and adding / to the Disallow statement will ensure that the particular crawl bot is not allowed to crawl your web servers.

User-agent: BadBot
Disallow: /

Creating robotx.txt file to allow only one web crawler to crawl your website

Add the name of the crawl bot you wish to allow to your website in the User-agent statement and leave the Disallow statement blank, this ensures that the crawl bot is given full access to crawl the website however we now also have to block all other website crawlers, this can be done by simply adding a * sign to the User-agent statement and adding / to the Disallow statement.

User-agent: Google
Disallow:

User-agent: *
Disallow: /

Now, that you created your desired robots.txt file it is important to save it and name in robots.txt. Kindly ensure you use the lower case and do not add anything else apart from robots.txt.

How to test & submit your Robots.txt file with Google?

Google’s robots.txt tester tool helps you to quickly test your robots.txt and allows you to find errors and warnings within your robots.txt. It also helps you check if you have intentionally or accidentally blocked Google crawlers.

Follow these steps to check your robots.txt file:

  1. Visit Google Webmaster Robots File Tester
  2. Choose a verified property or add a new property
  3. Edit your robots.txt and check for errors
  4. Click the “Test” button to check access
  5. Check if it says “Allowed” or Blocked
  6. Based on the results make changes to your website’s robots.txt file
  7. Edit accordingly and click on submit.

Make sure that you edit your website with the corrected robots.txt manually as the Google robots.txt tester tool will only test it and will not make changes to your website’s robots.txt file.

Back to Top
Top