Advertisement

All You Need to Know About XML Sitemaps

by
This post is part of a series called SEO Fundamentals for Web Designers.
The Problem of Duplicate Content and How to Solve it
Essential Tools for SEO

When we talk about sitemaps, most people think about a web page with links on it. There is however a more useful type of sitemap: the XML Sitemap. What are the benefits of having one? Are there other types? How is an XML Sitemap created? Read on to find out..


Why Would I Need a Sitemap?

A question a lot of you will be asking is: “Why would I need to add a sitemap to my site”?

Simply put, sitemaps are really handy for correctly indexing your website; they help search engines during the crawling process.

You could compare a sitemap to a road map for crawlers. Crawlers usually discover new pages via links (href or src). A sitemap is used to double-check their link database, allowing them to discover pages they might not otherwise have seen. As a bonus, you can provide crawlers with additional information about the URL by adding metadata.

This is especially useful for new websites or websites with a significant amount of new/updated pages. Thanks to a sitemap search engines can find their pages much faster, reducing the amount of time it takes to index them.

One thing that you should keep in mind is that a sitemap doesn’t guarantee that a listed page will be added to the index. If the page is of low quality or contains duplicate content, it may be excluded. A sitemap simply helps search engines understand your site structure.


Types of Sitemaps

Sitemaps can be divided into two categories: HTML Sitemaps and XML Sitemaps.

HTML Sitemaps

These are the classic sitemaps which visitors may use to navigate a website. They can usually be found on a separate page. HTML Sitemaps are easy to create because they are basically web pages where you show the structure of the website by means of links.

As they're built with HTML, we can add some CSS to spice things up a bit, keeping the sitemap in line with your visual brand experience. If you lack inspiration, you should check out SlickMap by Astuteo or the CSSsitemap System by David Leggett.

Besides the obvious benefit for the users, HTML sitemaps can be useful for SEO too. If crawlers can easily find a link to an HTML sitemap, it can help them understand your site structure. Don’t forget to update your sitemap if you add or remove pages (you will probably have to do this manually).

XML Sitemaps

XML Sitemaps are only used by search engines. All of the biggest search engines (Google, Bing, Yahoo) utilize XML sitemaps for the crawling process.

There are tons of online tools to help you automatically generate sitemaps (here's a useful list). If you’re using a content management system such as Wordpress or Joomla, there are plenty of plugins available.

Nevertheless it’s a good idea to have some background knowledge on how a sitemap works. That’s why we’re going to create a basic sitemap step by step in the following section.

Feeds

HTML and XML are the most frequently used Sitemap formats. However, Google also accepts RSS 2.0 and Atom 1.0 feeds. You can use the URL of these feeds as a sitemap. The problem with this technique is that older pages may not be included.


Creating an XML Sitemap

The big advantage of an XML Sitemap is the inclusion of metadata, allowing you to provide additional information about the content of each page. An XML Sitemap can be created as follows:

Step 1: Create a text file, name it ‘sitemap’ and save it with a .xml extension.

Step 2: Next we need to tell search engines how the sitemap is encoded, by adding the following snippet:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
</urlset>

Step 3: In our next step we add all the relevant URLs. We do this right before the closing urlset tag. Below you’ll find an example of a URL entry (don’t worry, we’ll discuss the various elements later):

<url>
    <loc>http://www.website.com/</loc>
    <lastmod>2012-12-12</lastmod>
    <changefreq>daily</changefreq>
    <priority>1</priority>
</url>
  • The loc-tag is used to link to the page. Simply enter the URL between the tags.
  • lastmod presents the date the page was last modified.
  • Changefreq is, as you might have guessed, the average change frequency of the page (hourly, daily, weekly, monthly, yearly...). Use ‘never’ for archived URLs.
  • You can also prioritize certain pages via the priority-tag. Priority values range from 0.0 to 1.0 (1.0 being the most important). The default priority of a page is 0.5. Assigning maximum priority to all your pages will not help since the priority is relative (it is only used to differentiate between pages in your sitemap).

The loc tag is required, the lastmod, changefreq and priority tags are optional!

Step 4: Now that we've created our sitemap it’s time to upload it to our site. It should be added to the root directory.

When creating a sitemap, there are some things you should keep in mind:

  • All URLs in a sitemap must come from the same host.
  • The maximum length for a URL is 2,048 characters (which should be more than enough).
  • A sitemap can contain a maximum of 50,000 URLs.
  • The maximum file size for sitemaps is 50 MB.

If your sitemap is too big, you can split it into multiple sitemaps, in which case you'll need to add a Sitemap Index file. This looks essentially the same as a normal sitemap, but some tags are named differently. Take a look at this example:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
      <loc>http://www.website.com/sitemap1.xml</loc>
   </sitemap>
   <sitemap>
      <loc>http://www.website.com/sitemap2.xml</loc>
   </sitemap>
</sitemapindex>

The Sitemap Index file links to two different sitemaps via the <loc> tag. Theoretically, a Sitemap Index file can link to a maximum of 50.000 Sitemaps.


Video Sitemaps

It’s also possible to create a sitemap for video’s. You can either create a separate file or add the video information to an existing sitemap. Adding this information increases the possibility that your video will show up as a rich snippet.

Keep in mind that Google can only crawl the following video formats: wmv, mp4, mpeg, mpg, m4v, asf, flv, swf, avi, ra and ram.

Let’s take a look at a video sitemap example and discuss the various elements.

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:video="http://www.google.com/schemas/sitemap-video/1.1"> 
   <url> 
     <loc>http://www.website.com/video-page.html</loc>
     <video:video>      
<video:thumbnail_loc>http://www.website.com/video-thumbnail.jpg</video:thumbnail_loc> 
<video:title>Most Awesome Video Ever</video:title>
<video:description>As the title says: this is the most awesome video ever. </video:description>
<video:content_loc>http://www.website.com/video.mp4</video:content_loc>
<video:duration>120</video:duration> 
</video:video> 
</url> 
</urlset>
  • The loc-tag indicates that page where the video can be found. If the video is used to create a rich snippet, this is the page the user will be sent to when he clicks on the thumbnail.
  • The video:thumbnail is used to create the rich snippet preview image.
  • The video:title..
  • ..and video:description should be self-explanatory.
  • The video:content links to the location of the video on the domain.
  • And finally, the video:duration should be added in seconds.

There are plenty of other tags you can add, such as a rating, view count, restrictions, etc. All the available tags can be found within Google's Webmaster resources.

PS: If you’re having trouble with the creation of a video sitemap, you can use this video sitemap generator from Distilled. It’s a Google Doc file that can automatically generate the correct code. All you have to do is copy and paste it into your sitemap.


Image Sitemaps

The Image Sitemap is very useful if you want your images to show up in Google Image Search results. This can give you a few extra visitors. As with the Video Sitemap you can add the images to an existing Sitemap or create a separate file.

A basic image Sitemap looks like this:

<?xml version="1.0" encoding="UTF-8"?>
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
 <url>
   <loc>http://website.com/page.html</loc>
   <image:image>
     <image:loc>http://website.com/image1.jpg</image:loc>
   </image:image>
   <image:image>
     <image:loc>http://website.com.com/image2.jpg</image:loc>
   </image:image>
 </url> 
</urlset>
  • First of all you need to specify the page where the images can be found. You can do this via the loc-tag.
  • Use the image:image-tag to list all the images from the page (up to 1000).

Optionally you can add other information such as a caption, location information and a title.


Validating Your Sitemap

Everybody knows how easy it for errors to sneak into your code, so it's a good idea to validate your sitemap to ensure that it is error-free. There are plenty of online tools which can check the validity of your file, such as this example on www.xml-sitemaps.com.

Alternatively you could use Google Webmaster Tools to test your XML Sitemaps. When you click the add/test Sitemap button under optimization > sitemaps, you can test a sitemap prior to submitting it.


Informing Search Engines

Now that we have created and validated our sitemap(s), it’s time to inform search engines about it.

You can inform Google and Bing about the location of your sitemap via their Webmaster Tools. For Google; log into your account and go to Optimization > Sitemaps. On the right hand side you’ll see the ‘add sitemap’ button. Simply add the URL of your sitemap and you’re done. In Bing Webmaster Tools, look for the Sitemap Widget and click ‘submit a sitemap’. Here you can enter the location of your sitemap.

Alternatively you could add the URL to your robots.txt file. All you have to do is add an extra line to your file, for example:

Sitemap: http://website.com/sitemap.xml

If you have a Sitemap Index file, you don’t need to add the separate sitemaps individually.


Conclusion

If you want to make sure that crawlers don’t miss any important pages or files within your website, it’s best to add an XML Sitemap. You can even add additional metadata such as a the change frequency and priority. Additionally, you could create a sitemap for video and images.

Once your sitemap is ready, don’t forget to validate it and notify search engines via the robots.txt file or their respective Webmaster Tools.

Advertisement