You can add this anywhere in the robots.txt file because the directive is independent of the user-agent line.  All you have to do is specify the location of your Sitemap in the sitemap-location.xml part of the URL. If you have multiple Sitemaps you can also specify the location of your Sitemap index file.  Learn more about sitemaps in our blog on XML Sitemaps.

Examples of Robots.txt Files:

There are two major elements in a robots.txt file: User-agent and Disallow.
User-agent: The user-agent is most often represented with a wildcard (*) which is an asterisk sign that signifies that the blocking instructions are for all bots. If you want certain bots to be blocked or allowed on certain pages, you can specify the bot name under the user-agent directive.
Disallow: When disallow has nothing specified it means that the bots can crawl all the pages on a site. To block a certain page you must use only one URL prefix per disallow. You cannot include multiple folders or URL prefixes under the disallow element in robots.txt.
The following are some common uses of robots.txt files.
To allow all bots to access the whole site (the default robots.txt) the following is used:
User-agent:* Disallow:
To block the entire server from the bots, this robots.txt is used:
User-agent:* Disallow: /
To allow a single robot and disallow other robots:
User-agent: Googlebot Disallow:User-agent: * Disallow: /
To block the site from a single robot:
User-agent: XYZbot Disallow: /
To block some parts of the site:
User-agent: * Disallow: /tmp/ Disallow: /junk/
Use this robots.txt to block all content of a specific file type. In this example we are excluding all files that are Powerpoint files. (NOTE: The dollar ($) sign indicates the end of the line):
User-agent: *
Disallow: *.ppt$
To block bots from a specific file:
User-agent: *
Disallow: /directory/file.html
To crawl certain HTML documents in a directory that is blocked from bots you can use an Allow directive. Some major crawlers support the Allow directive in robots.txt. An example is shown below:
User-agent: *
Disallow: /folder/
Allow: /folder1/myfile.html
To block URLs containing specific query strings that may result in duplicate content, the robots.txt below is used. In this case, any URL containing a question mark (?) is blocked:
User-agent: * Disallow: /*?
Sometimes a page will get indexed even if you include in the robots.txt file due to reasons such as being linked externally. In order to completely block that page from being shown in search results, you can include robots noindex Meta tags on those pages individually. You can also include a nofollow tag and instruct the bots not to follow the outbound links by inserting the following codes:
For the page not to be indexed:
     <meta name=“robots” content=“noindex”>
For the page not to be indexed and links not to be followed:
     <meta name>=“robots” content=“noindex,nofollow”>
NOTE: If you add these pages to the robots.txt and also add the above Meta tag to the page, it will not be crawled but the pages may appear in the URL-only listings of search results, as the bots were blocked specifically from reading the Meta tags within the page.
Another important thing to note is that you must not include any URL that is blocked in your robots.txt file in your XML sitemap. This can happen, especially when you use separate tools to generate the robots.txt file and XML sitemap. In such cases, you might have to manually check to see if these blocked URLs are included in the sitemap. You can test this in your Google Webmaster Tools account if you have your site submitted and verified on the tool and have submitted your sitemap.
Go to Webmaster Tools > Optimization > Sitemaps and if the tool shows any crawl error on the sitemap(s) submitted, you can double check to see whether it is a page included in robots.txt.

Google Webmaster Tools Showing Sitemaps with Crawl ErrorsYou can add this anywhere in the robots.txt file because the directive is independent of the user-agent line.  All you have to do is specify the location of your Sitemap in the sitemap-location.xml part of the URL. If you have multiple Sitemaps you can also specify the location of your Sitemap index file.  Learn more about sitemaps in our blog on XML Sitemaps.

Examples of Robots.txt Files:

There are two major elements in a robots.txt file: User-agent and Disallow.
User-agent: The user-agent is most often represented with a wildcard (*) which is an asterisk sign that signifies that the blocking instructions are for all bots. If you want certain bots to be blocked or allowed on certain pages, you can specify the bot name under the user-agent directive.
Disallow: When disallow has nothing specified it means that the bots can crawl all the pages on a site. To block a certain page you must use only one URL prefix per disallow. You cannot include multiple folders or URL prefixes under the disallow element in robots.txt.
The following are some common uses of robots.txt files.
To allow all bots to access the whole site (the default robots.txt) the following is used:
User-agent:* Disallow:
To block the entire server from the bots, this robots.txt is used:
User-agent:* Disallow: /
To allow a single robot and disallow other robots:
User-agent: Googlebot Disallow:User-agent: * Disallow: /
To block the site from a single robot:
User-agent: XYZbot Disallow: /
To block some parts of the site:
User-agent: * Disallow: /tmp/ Disallow: /junk/
Use this robots.txt to block all content of a specific file type. In this example we are excluding all files that are Powerpoint files. (NOTE: The dollar ($) sign indicates the end of the line):
User-agent: *
Disallow: *.ppt$
To block bots from a specific file:
User-agent: *
Disallow: /directory/file.html
To crawl certain HTML documents in a directory that is blocked from bots you can use an Allow directive. Some major crawlers support the Allow directive in robots.txt. An example is shown below:
User-agent: *
Disallow: /folder/
Allow: /folder1/myfile.html
To block URLs containing specific query strings that may result in duplicate content, the robots.txt below is used. In this case, any URL containing a question mark (?) is blocked:
User-agent: * Disallow: /*?
Sometimes a page will get indexed even if you include in the robots.txt file due to reasons such as being linked externally. In order to completely block that page from being shown in search results, you can include robots noindex Meta tags on those pages individually. You can also include a nofollow tag and instruct the bots not to follow the outbound links by inserting the following codes:
For the page not to be indexed:
     <meta name=“robots” content=“noindex”>
For the page not to be indexed and links not to be followed:
     <meta name>=“robots” content=“noindex,nofollow”>
NOTE: If you add these pages to the robots.txt and also add the above Meta tag to the page, it will not be crawled but the pages may appear in the URL-only listings of search results, as the bots were blocked specifically from reading the Meta tags within the page.
Another important thing to note is that you must not include any URL that is blocked in your robots.txt file in your XML sitemap. This can happen, especially when you use separate tools to generate the robots.txt file and XML sitemap. In such cases, you might have to manually check to see if these blocked URLs are included in the sitemap. You can test this in your Google Webmaster Tools account if you have your site submitted and verified on the tool and have submitted your sitemap.
Go to Webmaster Tools > Optimization > Sitemaps and if the tool shows any crawl error on the sitemap(s) submitted, you can double check to see whether it is a page included in robots.txt.

Google Webmaster Tools Showing Sitemaps with Crawl Errors
  • 0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
 

Google recently introduced three new features for easier management of website verifications. These new features would help in easier management of verified sites.

Feature 1

New Verification Details Link

This feature would allow you to see the methods used to verify an owner for your site. A verification link would be displayed under "manage site owners" that will show the verification details of the user and a link where the verification details can be found faster.


Feature 2

Verification Needs to be Removed Before Unverifying an Owner

If you want to unverify an owner then you need to remove the verification methods which were earlier used to verify the site.

Feature 3

Shorter CNAME Verification String to Support Large Number of DNS Providers

If you are verifying your site using CNAME verification then this method can now be easily implemented. Shorter CNAME can be used for those systems which limits the number of characters that can be used in DNS records.
28 Apr 2014

Posting Komentar

Emoticon
:) :)) ;(( :-) =)) ;( ;-( :d :-d @-) :p :o :>) (o) [-( :-? (p) :-s (m) 8-) :-t :-b b-( :-# =p~ $-) (b) (f) x-) (k) (h) (c) cheer
Click to see the code!
To insert emoticon you must added at least one space before the code.

 
Top