Technet - Use Sitemap Standards

From PeformIQ Upgrade
Jump to navigation Jump to search

Use sitemap standards to help search engines

Tony Patton, TechRepublic


The goal of every Web site is to increase site visibility and user traffic. One way to increase site traffic is through search engine optimisation. Another method is to use sitemaps, which allow you to specify what pages a search engine should process or index. The sitemap concept was originally developed by Google, with Yahoo! and MSN recently agreeing to use the standard. This week, I examine the sitemap (http://www.sitemaps.org/) standard.

The need for a standard

Search engines use spiders to crawl the Internet to locate pages and index them in their database. The process is resource intensive, and sometimes the pages you want indexed are overlooked or non-essential pages are indexed as well. A good example is Google's Googlebot spider that traverses the Web for changes and new pages and indexes and ranks them accordingly.

Sitemaps provide a way for Web sites to specify what pages within the site should be indexed and what new content has been added. Basically, it provides a communication channel between the search engine and the site. Theoretically, it can ease the resource burden on search engine spiders by reducing what it processes, but currently sitemaps do not replace the crawling process.

What is a sitemap?

A sitemap is an XML file that contains a list of site URLs and related attributes detailing what should be indexed within a specific site. It must be UTF-8 encoded. The following XML elements are required in the sitemap file:

  • <urlset>--The file begins and ends with this tag, and the opening tag must include the namespace (xmlns) attribute.
  • <url>--Each page included in the file is enclosed in this entity.
  • <loc>--The actual address of the page specified in the file. It is a child of the <url> element.

The following optional elements are available as well:


  • <lastmod>--A child of the <url> element. It specifies when the page was last modified.
  • <changefreq>--A child of the <url> element. It specifies how often the page changes (always, hourly, daily, weekly, monthly, yearly, and never).
  • <priority>--A child of the <url> element. It specifies the importance of the page in relation to other pages within the site with valid values of 0.0 to 1.0 and a default value of 0.5.


The following sample sitemap shows how these elements may be used for a sample site. It specifies the home page for a fictitious site, along with how often it changes, when it was last changed, and its priority within the site.

<?xml version="1.0" encoding="UTF-8"?>
<urlsetxmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.test.com/</loc>
<lastmod>2006-11-20</lastmod>
<changefreq>daily</changefreq>
<priority>0.3</priority>
</url>
</urlset>

The location of the sitemap file is up to you, but its location determines the set of URLs that may be included in it. For example, if the previous sample sitemap file is located at http://www.test.com/sitemap.xml, then the sitemap file may include any URLs starting with http://www.test.com/. For this reason, it is suggested that sitemap files are placed in your site's root directory. The size of a sitemap file must not exceed 10 MB. If a sitemap exceeds this limit, you may compress it using gzip.

Creating a sitemap

Since XML is the basis for the sitemap, you can easily create and edit them via any text editor, but there are also special tools available. The following list provides a sample of currently available tools:

A Python script that may be used to generate sitemap files.


Notifying a search engine

Once you have a sitemap file, it must be submitted to a search engine. Each search engine has its own interface for submitting sitemaps. Google includes a sitemap submission page as part of its Webmaster toolset. You must sign up (https://www.google.com/webmasters/tools/docs/en/about.html) for an account before it can be used. Yahoo! includes a freely available submission page (http://submit.search.yahoo.com/free/request) for sitemaps, but you must sign up for an account before it can be used. Search engines will provide similar functionality as they follow the lead of Google, Yahoo!, and MSN.

Another tool

The crawling process by which search engines index the Web is slow and resource intensive. Sitemaps provide a way for Web sites to specify what aspects of its contents are actually indexed for searching. They are created as simple text files formatted as XML, but there are plenty of tools available to assist you with their creation. At this time, they only serve as an addition to the current process.