The robots.txt file also called “robots exclusion protocol”, is a set of instructions that tell search engine crawlers what type of content they are allowed and disallowed to request from your website. Robotx.txt is a public file, which anyone can access to find what sections of your website, you do not want web crawlers to see, and which web bots you have blocked from crawling your website.
It is recommended not to use robots.txt file to hide sections of your website, instead use the no-index attribute or enable password access to those specific pages of your website.
Robots.txt example:
User-agent: * Disallow: /
The robots.txt file tells search engine crawlers which URLs or files on your website can or cannot be requested by the search engines. It does this by telling the search engines what type of content should not be found on your site.
The above reasons are why Google encourages you to have one.
The robots.txt file needs to be placed in the top-level directory of your web server and should be available in your website’s root.
Your robots.txt file URL should look like this – “http://www.example.com/robots.txt”.
The most common error while creating a robots.txt file is the use or capital cases in the file name, the correct format should always be in lower case, such as /robots.txt and not /Robots.TXT.
Robots.txt file is simple text file thus you can use any text file notepad and start creating your file on it.
Let’s understand two main statements of the robots.txt file.
Let’s look at multiple scenarios of creating a robots.txt file.
To exclude all website crawlers from your servers you would add a / to your Disallow statement as shown below.
User-agent: * Disallow: /
If you want to provide complete access of your web servers to all website crawlers then you should leave the Disallow statement blank as shown below, you can also do this by simply not creating a robots.txt file and this will also allow all crawl bots to crawl your website.
User-agent: * Disallow:
To exclude all website crawlers from part of your website server, add the file name of the intended file to be blocked to the Disallow statement of the robots.txt file. Below you can see that we are blocking three web server files to accessed by website crawlers.
User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /junk/
By adding the name of the crawl bot to the User-agent statement and adding / to the Disallow statement will ensure that the particular crawl bot is not allowed to crawl your web servers.
User-agent: BadBot Disallow: /
Add the name of the crawl bot you wish to allow to your website in the User-agent statement and leave the Disallow statement blank, this ensures that the crawl bot is given full access to crawl the website however we now also have to block all other website crawlers, this can be done by simply adding a * sign to the User-agent statement and adding / to the Disallow statement.
User-agent: Google Disallow: User-agent: * Disallow: /
Now, that you created your desired robots.txt file it is important to save it and name in robots.txt. Kindly ensure you use the lower case and do not add anything else apart from robots.txt.
Google’s robots.txt tester tool helps you to quickly test your robots.txt and allows you to find errors and warnings within your robots.txt. It also helps you check if you have intentionally or accidentally blocked Google crawlers.
Follow these steps to check your robots.txt file:
Make sure that you edit your website with the corrected robots.txt manually as the Google robots.txt tester tool will only test it and will not make changes to your website’s robots.txt file.
Comments are closed.