April 7, 2025 04:59:57 AM Menu

You can add this anywhere in the robots.txt file because the directive is independent of the user-agent line. All you have to do is specify the location of your Sitemap in the sitemap-location.xml part of the URL. If you have multiple Sitemaps you can also specify the location of your Sitemap index file. Learn more about sitemaps in our blog on XML Sitemaps.

Examples of Robots.txt Files:

There are two major elements in a robots.txt file: User-agent and Disallow.

User-agent: The user-agent is most often represented with a wildcard (*) which is an asterisk sign that signifies that the blocking instructions are for all bots. If you want certain bots to be blocked or allowed on certain pages, you can specify the bot name under the user-agent directive.

Disallow: When disallow has nothing specified it means that the bots can crawl all the pages on a site. To block a certain page you must use only one URL prefix per disallow. You cannot include multiple folders or URL prefixes under the disallow element in robots.txt.

The following are some common uses of robots.txt files.

To allow all bots to access the whole site (the default robots.txt) the following is used:

User-agent:* Disallow:

To block the entire server from the bots, this robots.txt is used:

User-agent:* Disallow: /

To allow a single robot and disallow other robots:

User-agent: Googlebot Disallow:User-agent: * Disallow: /

To block the site from a single robot:

User-agent: XYZbot Disallow: /

To block some parts of the site:

User-agent: * Disallow: /tmp/ Disallow: /junk/

Use this robots.txt to block all content of a specific file type. In this example we are excluding all files that are Powerpoint files. (NOTE: The dollar ($) sign indicates the end of the line):

User-agent: *
Disallow: *.ppt$

To block bots from a specific file:

User-agent: *
Disallow: /directory/file.html

To crawl certain HTML documents in a directory that is blocked from bots you can use an Allow directive. Some major crawlers support the Allow directive in robots.txt. An example is shown below:

User-agent: *
Disallow: /folder/
Allow: /folder1/myfile.html

To block URLs containing specific query strings that may result in duplicate content, the robots.txt below is used. In this case, any URL containing a question mark (?) is blocked:

User-agent: * Disallow: /*?

Sometimes a page will get indexed even if you include in the robots.txt file due to reasons such as being linked externally. In order to completely block that page from being shown in search results, you can include robots noindex Meta tags on those pages individually. You can also include a nofollow tag and instruct the bots not to follow the outbound links by inserting the following codes:

For the page not to be indexed:

<meta name=“robots” content=“noindex”>

For the page not to be indexed and links not to be followed:

<meta name>=“robots” content=“noindex,nofollow”>

NOTE: If you add these pages to the robots.txt and also add the above Meta tag to the page, it will not be crawled but the pages may appear in the URL-only listings of search results, as the bots were blocked specifically from reading the Meta tags within the page.

Another important thing to note is that you must not include any URL that is blocked in your robots.txt file in your XML sitemap. This can happen, especially when you use separate tools to generate the robots.txt file and XML sitemap. In such cases, you might have to manually check to see if these blocked URLs are included in the sitemap. You can test this in your Google Webmaster Tools account if you have your site submitted and verified on the tool and have submitted your sitemap.

Go to Webmaster Tools > Optimization > Sitemaps and if the tool shows any crawl error on the sitemap(s) submitted, you can double check to see whether it is a page included in robots.txt.

Google Webmaster Tools Showing Sitemaps with Crawl Errors

Examples of Robots.txt Files:

There are two major elements in a robots.txt file: User-agent and Disallow.

The following are some common uses of robots.txt files.

To allow all bots to access the whole site (the default robots.txt) the following is used:

User-agent:* Disallow:

To block the entire server from the bots, this robots.txt is used:

User-agent:* Disallow: /

To allow a single robot and disallow other robots:

User-agent: Googlebot Disallow:User-agent: * Disallow: /

To block the site from a single robot:

User-agent: XYZbot Disallow: /

To block some parts of the site:

User-agent: * Disallow: /tmp/ Disallow: /junk/

Use this robots.txt to block all content of a specific file type. In this example we are excluding all files that are Powerpoint files. (NOTE: The dollar ($) sign indicates the end of the line):

User-agent: *
Disallow: *.ppt$

To block bots from a specific file:

User-agent: *
Disallow: /directory/file.html

To crawl certain HTML documents in a directory that is blocked from bots you can use an Allow directive. Some major crawlers support the Allow directive in robots.txt. An example is shown below:

User-agent: *
Disallow: /folder/
Allow: /folder1/myfile.html

To block URLs containing specific query strings that may result in duplicate content, the robots.txt below is used. In this case, any URL containing a question mark (?) is blocked:

User-agent: * Disallow: /*?

For the page not to be indexed:

<meta name=“robots” content=“noindex”>

For the page not to be indexed and links not to be followed:

<meta name>=“robots” content=“noindex,nofollow”>

Go to Webmaster Tools > Optimization > Sitemaps and if the tool shows any crawl error on the sitemap(s) submitted, you can double check to see whether it is a page included in robots.txt.

Dublin Core Metadata for SEO and Usability

The Dublin Core Metadata Initiative (DCMI) is an open source movement that started in Dublin, Ohio, ...

18 Mar 20140

Robots.Txt: A Beginners Guide

SEO Guide, Meta Tag Robot.txt, Optimated Robot. Robots.txt is: A simple file that contains component...

18 Mar 20140

SEO Strategies: Avoid Jumping the Gun

SEO Strategy, SEO Metode, Implementing search engine optimization into a web site is as important as...

18 Mar 20140

How to Use Heading Tags for SEO

SEO Title Tag, SEO Heading Tags. Heading tags, as their name suggests, are used to differentiate the...

18 Mar 20141

What is a PageRank and How Important is it in 2013?

PageRank (PR) is a quality metric invented by Google’s owners Larry Page and Sergey Brin. The values...

18 Mar 20140

Latest

7:53 PM Penguin Penalties: Do Webmasters Respond the Way They Should?

Home » Google Webmasters » Google Introduces 3 New Methods for Easier Management of Site Verifications

Google Introduces 3 New Methods for Easier Management of Site Verifications

A+ A-

Print Email

Google recently introduced three new features for easier management of website verifications. These new features would help in easier management of verified sites.

Feature 1

New Verification Details Link

This feature would allow you to see the methods used to verify an owner for your site. A verification link would be displayed under "manage site owners" that will show the verification details of the user and a link where the verification details can be found faster.

Feature 2

Verification Needs to be Removed Before Unverifying an Owner

If you want to unverify an owner then you need to remove the verification methods which were earlier used to verify the site.

Feature 3

Shorter CNAME Verification String to Support Large Number of DNS Providers

If you are verifying your site using CNAME verification then this method can now be easily implemented. Shorter CNAME can be used for those systems which limits the number of characters that can be used in DNS records.

28 Apr 2014

How to Decrease Bounce Rates
28 Apr 20141
Google keeps a close eye on the user-experience of the sites they list. So for pure white-hat ...Read more Â»
Search Appearance Pop Up - A Great Hint to Properly Optimize Your Website!
28 Apr 20140
Google recently unveiled a new pop up window named "Search appearance Pop Up" which can be seen besi...Read more Â»
Adding Thousands of Pages At Once to a Site - Does Google Counts it As a Spam?
28 Apr 20140
Some large sites like news websites may feel the necessity to add several thousands webpages on the ...Read more Â»
Do Exact Match Anchor Text Internal Links Hurt a Website?
28 Apr 20140
Many webmasters often look confused regarding the relationship between exact match anchor text inter...Read more Â»
Mistakes to Avoid When Using rel=canonical
28 Apr 20140
rel=canonical link element helps a webmaster to solve the duplicate content issue and present t...Read more Â»
What to do When You Loose Google Rankings Because Your Site is Down?
28 Apr 20140
It is common for the websites to go offline for some hours or some days together. But, during ...Read more Â»