Support & Documentation

Close
How to use robots.txt?
About /robots.txt

Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to visits a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

 

User-agent: *
Disallow: /

 

The “User-agent: *” means this section applies to all robots. The “Disallow: /” tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

  • robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
  • the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don’t want robots to use.

So it is not recommended to use /robots.txt to hide information.

 

How to edit /robots.txt file from WebConsole:

Login to the Console using your domain name and password. From the Console screen click Administration -> Edit Robots.txt

edit_robots_txt

 

What to put in it?

 The “/robots.txt” file is a text file, with one or more records. Usually contains a single record looking like this:

 

User-agent: *
Disallow: /uploadedFiles/
Disallow: /xml/

 

In this example, three directories are excluded.Note that you need a separate “Disallow” line for every URL prefix you want to exclude — you cannot say “Disallow: /uploadedFiles/ /xml/” on a single line. Also, you may not have blank lines in a record, as they are used to delimit multiple records.

Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The ‘‘ in the User-agent field is a special value meaning “any robot”. Specifically, you cannot have lines like “User-agent: *bot“, “Disallow: /AddToCart/*” or “Disallow: *.jpg”.

What you want to exclude depends on your server. Everything not explicitly disallowed is considered fair game to retrieve. Here follow some examples:

To exclude all robots from the entire website

User-agent: *
Disallow: /

 

To allow all robots complete access

User-agent: *
Disallow:

(or just create an empty “/robots.txt” file, or don’t use one at all)

 

To exclude all robots from part of the website

User-agent: *
Disallow: /xml/
Disallow: /tmp/
Disallow: /flash/

To exclude a single robot

User-agent: BadBot
Disallow: /

 
To allow a single robot

User-agent: Google
Disallow:
User-agent: *
Disallow: /

 

To include all files except one particular folder

This is currently a bit awkward, as there is no “Allow” field. The easy way is to put all files to be disallowed into a separate directory, say ‘stuff’, and leave the one file in the level above this directory:

User-agent: *
Disallow: /uploadedFiles/stuff/

 

Alternatively you can explicitly disallow all disallowed pages:

User-agent: *
Disallow: /uploadedFiles/junk.html
Disallow: /uploadedFiles/foo.html
Disallow: /uploadedFiles/bar.html