What is the Robots.txt File?

robots.txt is a text file that instructs web robots (typically search engine robots) how to crawl pages on their website.

The robots.txt file is part of the robots exclusion protocol (REP), a group of web standards that regulate how robots crawl the web, access and index content.

List of User Agents

Google – Googlebot
Bing – Bingbot
Yahoo – Slurp
MSN – Msnbot

Code Sample

User-agent: [user-agent name]

Disallow: [URL string not to be crawled]

Example robots.txt

Here are a few examples of robots.txt in action for a www.example.com site:

a. Blocking all web crawlers from all content

User-agent: *

Disallow: /

b. Allowing all web crawlers access to all content

User-agent: *

Disallow:

c. Blocking a specific web crawler from a specific folder

User-agent: Googlebot

Disallow: /team

d. Blocking a specific web crawler from a specific web page

User-agent: Bingbot

Disallow: /team/founder.html

Other quick robots.txt must-knows

In order to be found, a robots.txt file must be placed in a website’s top-level directory.
Robots.txt is case sensitive: the file must be named “robots.txt” (not Robots.txt, robots.TXT, or otherwise).
The /robots.txt file is a publicly available: just add /robots.txt to the end of any root domain to see that website’s directives
Each sub-domain on a root domain uses separate robots.txt files. This means that both blog.example.com and example.com should have their own robots.txt files
It’s generally a best practice to indicate the location of any sitemaps associated with the domain at the bottom of the robots.txt file.

Robots.txt is a text file that instructs web robots (typically search engine robots) how to crawl pages on their website.

What is the Robots.txt File?

List of User Agents

Code Sample

Example robots.txt

Other quick robots.txt must-knows

Monkey Owl

Add comment

You may also like

Topics

Recent posts

Email Newsletter

Help & Support

Follow us

Most popular

Most discussed