当Sitemap随着网站的更新增加了内容,如何最快的让搜索引擎知道而不是坐等蜘蛛光临呢?当然最好的办法就是告诉搜索引擎你的Sitemap更新了,就等你放蜘蛛过来了!
如何告诉搜索引擎?用他们开放的Ping功能。
遗憾的是国内搜索引擎对Sitemap都不感兴趣,更别说Ping了,所以中文站可能效果有限。
下面几个地址是老乐搜集过来的,大家可以照此格式将其中Sitemap完整地址换成你自己的,Ping一下搜索引擎,告诉你的Sitemap更新。
Google:http://www.google.com/webmasters/sitemaps/ping?sitemap=XML文件完整地址
Yahoo:http://api.search.yahoo.com/SiteExplorerService/V1/updateNotification?appid=YahooDemo&url=XML文件完整地址
Live:http://webmaster.live.com/ping.aspx?siteMap=XML文件完整地址
Ask:http://submissions.ask.com/ping?sitemap=XML文件完整地址
Moreover:http://api.moreover.com/ping?u=XML文件完整地址
可惜,主流中文搜索引擎对Sitemap不感冒,支持Sitemap的搜索引擎市场份额又上不去。
2008年11月17日星期一
2008年11月15日星期六
Using the robots meta tag
Recently, Danny Sullivan brought up good questions about how search engines handle meta tags. Here are some answers about how we handle these tags at Google.
Multiple content values
We recommend that you place all content values in one meta tag. This keeps the meta tags easy to read and reduces the chance for conflicts. For instance:
<meta name="ROBOTS" content="NOINDEX, NOFOLLOW">
If the page contains multiple meta tags of the same type, we will aggregate the content values. For instance, we will interpret
<meta name="ROBOTS" content="NOINDEX">
<meta name="ROBOTS" content="NOFOLLOW">
The same way as:
<meta name="ROBOTS" content="NOINDEX, NOFOLLOW">
If content values conflict, we will use the most restrictive. So, if the page has these meta tags:
<meta name="ROBOTS" content="NOINDEX">
<meta name="ROBOTS" content="INDEX">
We will obey the NOINDEX value.
Unnecessary content values
By default, Googlebot will index a page and follow links to it. So there's no need to tag pages with content values of INDEX or FOLLOW.
Directing a robots meta tag specifically at Googlebot
To provide instruction for all search engines, set the meta name to "ROBOTS". To provide instruction for only Googlebot, set the meta name to "GOOGLEBOT". If you want to provide different instructions for different search engines (for instance, if you want one search engine to index a page, but not another), it's best to use a specific meta tag for each search engine rather than use a generic robots meta tag combined with a specific one. You can find a list of bots at robotstxt.org.
Casing and spacing
Googlebot understands any combination of lowercase and uppercase. So each of these meta tags is interpreted in exactly the same way:
<meta name="ROBOTS" content="NOODP">
<meta name="robots" content="noodp">
<meta name="Robots" content="NoOdp">
If you have multiple content values, you must place a comma between them, but it doesn't matter if you also include spaces. So the following meta tags are interpreted the same way:
<meta name="ROBOTS" content="NOINDEX, NOFOLLOW">
<meta name="ROBOTS" content="NOINDEX,NOFOLLOW">
If you use both a robots.txt file and robots meta tags
If the robots.txt and meta tag instructions for a page conflict, Googlebot follows the most restrictive. More specifically:
Googlebot interprets the following robots meta tag values:
As defined by robotstxt.org, the following direction means NOINDEX, NOFOLLOW.
<meta name="ROBOTS" content="NONE">
However, some webmasters use this tag to indicate no robots restrictions and inadvertently block all search engines from their content.
Multiple content values
We recommend that you place all content values in one meta tag. This keeps the meta tags easy to read and reduces the chance for conflicts. For instance:
<meta name="ROBOTS" content="NOINDEX, NOFOLLOW">
If the page contains multiple meta tags of the same type, we will aggregate the content values. For instance, we will interpret
<meta name="ROBOTS" content="NOINDEX">
<meta name="ROBOTS" content="NOFOLLOW">
The same way as:
<meta name="ROBOTS" content="NOINDEX, NOFOLLOW">
If content values conflict, we will use the most restrictive. So, if the page has these meta tags:
<meta name="ROBOTS" content="NOINDEX">
<meta name="ROBOTS" content="INDEX">
We will obey the NOINDEX value.
Unnecessary content values
By default, Googlebot will index a page and follow links to it. So there's no need to tag pages with content values of INDEX or FOLLOW.
Directing a robots meta tag specifically at Googlebot
To provide instruction for all search engines, set the meta name to "ROBOTS". To provide instruction for only Googlebot, set the meta name to "GOOGLEBOT". If you want to provide different instructions for different search engines (for instance, if you want one search engine to index a page, but not another), it's best to use a specific meta tag for each search engine rather than use a generic robots meta tag combined with a specific one. You can find a list of bots at robotstxt.org.
Casing and spacing
Googlebot understands any combination of lowercase and uppercase. So each of these meta tags is interpreted in exactly the same way:
<meta name="ROBOTS" content="NOODP">
<meta name="robots" content="noodp">
<meta name="Robots" content="NoOdp">
If you have multiple content values, you must place a comma between them, but it doesn't matter if you also include spaces. So the following meta tags are interpreted the same way:
<meta name="ROBOTS" content="NOINDEX, NOFOLLOW">
<meta name="ROBOTS" content="NOINDEX,NOFOLLOW">
If you use both a robots.txt file and robots meta tags
If the robots.txt and meta tag instructions for a page conflict, Googlebot follows the most restrictive. More specifically:
- If you block a page with robots.txt, Googlebot will never crawl the page and will never read any meta tags on the page.
- If you allow a page with robots.txt but block it from being indexed using a meta tag, Googlebot will access the page, read the meta tag, and subsequently not index it.
Googlebot interprets the following robots meta tag values:
- NOINDEX - prevents the page from being included in the index.
- NOFOLLOW - prevents Googlebot from following any links on the page. (Note that this is different from the link-level NOFOLLOW attribute, which prevents Googlebot from following an individual link.)
- NOARCHIVE - prevents a cached copy of this page from being available in the search results.
- NOSNIPPET - prevents a description from appearing below the page in the search results, as well as prevents caching of the page.
- NOODP - blocks the Open Directory Project description of the page from being used in the description that appears below the page in the search results.
- NONE - equivalent to "NOINDEX, NOFOLLOW".
As defined by robotstxt.org, the following direction means NOINDEX, NOFOLLOW.
<meta name="ROBOTS" content="NONE">
However, some webmasters use this tag to indicate no robots restrictions and inadvertently block all search engines from their content.
订阅:
博文 (Atom)