Arguably one of the more straightforward technical elements of SEO, XML sitemaps are often typically misunderstood. To get a better understanding of XML sitemaps and how to use them efficiently, it helps to know what they are and what they are not.
What are XML sitemaps?
In its simplest form, a sitemap serves as a road map for search engines to discover your website’s most important content and get further context on your website’s overall structure. In addition to providing search engines with a list of URLs, sitemaps can help search engines find newer content, or content located deep within the website’s architecture, which helps websites with a poor internal linking structure.
Common myths about XML sitemaps:
- A sitemap is not a list of the pages on your site. There’s no need to include every page in the sitemap. Most websites have sensitive content, like investor information, or content that does not provide a great user experience through search, like login or account pages, and content located behind paywalls or pages returning non-200 response codes. These are examples of pages that should not be made available to search engines and can be left out of a sitemap.
- Sitemaps aren’t needed if my site is well laid out. While a good infrastructure is always important, an XML sitemap is meant to serve as an indicator of the most important content that that you want to be crawled and considered for indexation. If you have an enterprise-level site, relying on your infrastructure alone probably isn’t the safest bet to ensure crawling and indexation. Setting up your sitemap to feature your most important pages will assist a search engines’ ability to understand what you consider your most important content to be. Since the search engines operate on crawl budgets, this can be an advantageous approach for larger sites. If your website has more than 50,000 URLs of important content, creating a sitemap index that contains multiple sitemaps may be the way to go.
- Sitemaps tell Google what to index. An XML sitemap does not guarantee that a page will be indexed, just that it will be considered for indexation.
Now that we know the myths and what sitemaps aren’t, how can we use them to improve our site organically?
Using XML sitemaps to your advantage
Ignore ‘priority’ and ‘change frequency’ tags:
Two popular pieces of markup found in XML sitemaps are the ‘priority’ and ‘change frequency’ tags. Many webmasters will utilize this markup to improve crawl efficiency and highlight a website’s priority content. John Mueller of Google has stated that Google ignores these two signals. However, he has indicated the utilizing the lastmod markup is used when Google analyzing a sitemap. Focusing on this tag and making sure that you are including the right URLs will go a long way to ensure that your site map is crawled efficiently and has the greatest impact.
Improve your chances at content being indexed
Make your sitemap available to the search engines:
A big first step in making sure that your most important content is discovered is to learn how to create a sitemap and place it in the root directory of your server.
Next, be sure to provide a link to your XML sitemap in your robots.txt file. This file is one of the first places a search engine bot will visit when it hits a website. There it will find directives on what content to crawl and what content to avoid. By including a link to your sitemap, you help ensure that search engines are discovering and crawling your content.
A final step is to physically submit your sitemap to Google Search Console and Bing Webmaster Tools. According to Google’s webmaster's forum, they don’t check your sitemap every time it is updated, only the first time they notice it. After that, they will check your sitemap only when they are notified it has changed. This can be done using Google Search Console’s sitemap tool, and using the “ping” functionality to Ask Google to crawl your site map by sending an HTTP GET request:
Only include valid URLs:
It’s imperative that your sitemap references URLs that are indexable and returning a 200 OK response code. Webmasters, SEOs or dev teams should routinely audit their website’s sitemap to remove pages returning 404 errors, 300-response codes and 500-level server errors. This can be done manually by crawling the sitemap or utilizing Google Search Console’s XML Sitemap report to identify invalid URLs. Remember, search engines operate on a crawl budget, so every non-indexable URL increases the chance a valid one won’t get crawled.
Use consistent, qualified URLs:
Consistency is important to a properly formatted XML sitemap. Make sure to use consistent protocols. If your website is a secure site (uses HTTPS) then make sure that the sitemap and all URLs are using the secure protocol. Otherwise, your sitemap will contain redirects which can affect your crawl efficiency and indexation.
Utilize consistent sub domains. Since the XML sitemap provides insight into website architecture and organization, each subdomain should have its own sitemap. This will also help keep your sitemaps as condensed as possible.
Include unique URLs:
Be sure to only include canonical versions of URLs. URLs that include parameters or session IDs can be considered duplicative and should be excluded. Otherwise, crawl efficiency and overall indexation could suffer. When conducting regular sitemap audits, be sure to look for any-non-canonical URLs and remove them. Again, utilizing Google Search console’s sitemap report can help you easily identify non-canonical URLs and checking this report regularly is a good best practice. In addition to utilizing Google’s tools through Search Console, leveraging BrightEdge's ContentIQ site audit tools can help SEOs and webmasters identify non-canonical URLs and pages returning non-200 response codes to help further audit your XML sitemaps.
Do not include non-alphanumeric characters:
A sitemap needs to be UTF-8 encoded. URLs must use entity escape codes for characters like ampersands (&), single quotes (‘), double quotes (“), less than (<), and greater than (>). Also, URLs should only contain ASCII characters.
Limit the size of the sitemap:
The size of an XML sitemaps can quickly get out of hand, especially for larger websites like e-commerce sites. When a sitemap gets too big, it can negatively impact the number URLs that are crawled and indexed, and it can contribute to your web server getting bogged down if it needs to serve large files. To combat this, XML sitemaps should be limited to containing 50,000 URLs and/or being no larger than 50 MB. This means that larger sites may need to use multiple site maps in a sitemap index file.
For larger sitemaps, breaking out sections of content into their own sitemaps can help keep content organized and help avoid sitemap bloat. Creating separate sitemaps for videos, images, and blogs may be a good idea.
Use XML sitemap creation tools:
There are many tools that can assist in XML sitemap creation. Many CMS’ have dynamic sitemap creation options that you can use to help manage what content is published in your sitemap file. A CMS like WordPress has several plugins to help manage sitemaps.
Now that you know how to create a sitemap, format, setup and edit one, it’s time to prepare the list of your most important content to include and get it submitted to the search engines. Get started today!