XML Sitemap: What It Does and Doesn't Do

What is an XML sitemap? It is a machine-readable file that hands search engines a clean list of the URLs you want crawled and indexed, with optional hints about when each page last changed. Think of it as a directory you slip under the door for the crawler: here are my real pages, here is what recently changed, please go look. Useful, occasionally important, and quietly oversold, because a sitemap helps a page get found, not ranked, and a lot of people treat it like a fix when it is closer to a formality.

What is an XML sitemap, in plain English?

A search engine finds your pages two ways. It follows links (from other sites, and from page to page inside yours), and it reads any sitemap you give it. The XML sitemap is the second path: instead of hoping a crawler stumbles onto every URL by following links, you write down the list yourself in a format built for machines, not humans.

The file itself is plain XML. Each entry has a <loc> (the URL) and can carry optional fields like <lastmod> (when the page last meaningfully changed). Older fields like <changefreq> and <priority> still exist in the spec, but Google has said for years it ignores them, so they are noise you can drop.

Here is the part worth internalizing early: a sitemap is a suggestion, not a command. Listing a URL does not force Google to crawl it, index it, or rank it. It tells the engine "these pages exist and here is what changed," which is genuinely helpful for discovery. What happens after discovery still depends on whether the page is worth indexing. The sitemap gets you to the front door. It does not get you inside.

How an XML sitemap works

A minimal sitemap looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/services/seo</loc>
    <lastmod>2026-06-12</lastmod>
  </url>
</urlset>

You make the engine aware of it two ways, and you should do both. Submit the sitemap URL in the Sitemaps report in Google Search Console (and Bing Webmaster Tools), and add a Sitemap: line to your robots.txt so any crawler that reads that file can find it without being told. The robots.txt reference matters more now, because Googlebot is no longer the only crawler reading it.

A couple of hard limits shape how this scales. One sitemap file maxes out at 50,000 URLs and 50MB uncompressed. Past that, you split your URLs across multiple sitemap files and stitch them together with a sitemap index, a parent file that lists your child sitemaps. Most large sites also segment by content type (one sitemap for products, one for blog posts, one for category pages), which makes the Search Console coverage report far easier to read because you can see which section is failing to index instead of staring at one undifferentiated pile.

The <lastmod> date is the one optional field worth getting right. Google uses it as a signal of what genuinely changed, which helps it prioritize recrawls. But it has to be honest. If your CMS stamps today's date on every URL on every build, the field becomes meaningless and Google learns to ignore it. Accurate <lastmod> is a small, real edge; faked <lastmod> is worse than none.

Why an XML sitemap matters (and when it doesn't)

Let me be straight about this, because the marketing world tends to inflate it. For a small, well-linked site (a 40-page service business, a tidy blog), Google will find your pages by following links whether or not you submit a sitemap. The sitemap is insurance, not a lever. Submitting one is good hygiene and costs you nothing, but it is not why your traffic will or won't grow.

Where a sitemap earns real keep:

Large sites. E-commerce catalogs, news publishers, and database-driven sites with tens of thousands of URLs, where link-only discovery genuinely misses pages. This is also where it interacts with crawl budget: a clean sitemap of canonical URLs points the crawler's finite attention at pages that matter instead of letting it wander.
New sites. A brand-new domain with few or no backlinks has almost nothing for a crawler to follow. A sitemap is your fastest route to discovery.
Deep or orphaned pages. Pages buried many clicks from the homepage, or not linked from anywhere, may never be found by crawling alone. A sitemap surfaces them. (The better fix is to repair the internal linking, but the sitemap stops the bleeding in the meantime.)
Diagnostics. Even when a sitemap doesn't change discovery, the Search Console report comparing submitted versus indexed URLs is one of the cleanest ways to spot an indexing problem and see exactly which pages Google is choosing to leave out.

What a sitemap does not do: it does not improve rankings, it does not force indexing, and it does not rescue thin or duplicate pages. It is part of technical SEO plumbing, on the discovery side. Helpful, sometimes essential, never a growth strategy on its own.

Common XML sitemap mistakes

Most sitemap problems come from the file disagreeing with the rest of your site. Search engines notice when your sitemap says "index this" while your page says "don't," and the mixed signal wastes attention you'd rather spend elsewhere.

Mistake	Why it hurts	The fix
Including noindex or redirected URLs	You tell Google to crawl pages you also tell it to ignore	List only 200-status, indexable, canonical URLs
Listing non-canonical duplicates	Splits crawl attention across copies of one page	Include only the URL named by your canonical tag
Faked or build-stamped <lastmod>	Google learns the field is meaningless and ignores it	Stamp the date only when the content genuinely changes
Letting it go stale	Dead URLs and missing new pages erode trust in the file	Auto-generate it so it stays in sync with the live site
One giant unsplit file	Hard to diagnose, and breaks at 50,000 URLs / 50MB	Split by content type and use a sitemap index
Listing pages blocked in robots.txt	Google can't crawl what you've disallowed, so the entry is dead	Keep robots.txt and the sitemap in agreement

The throughline: every URL in the file should be a canonical, indexable page you'd be happy to see ranked. If you wouldn't want it in search results, it doesn't belong in the sitemap. On most modern platforms (WordPress with a decent SEO plugin, Shopify, well-built custom sites) the sitemap is generated automatically and stays clean on its own. The failures we see are usually a misconfigured plugin, a CMS stamping fake dates, or a hand-maintained file nobody has touched in a year.

The bottom line

An XML sitemap is table-stakes, not a finish line. Create one, keep it limited to canonical indexable URLs, submit it in Search Console, reference it in robots.txt, and then stop thinking about it. For a small site it is insurance; for a large or new site it is genuinely load-bearing for discovery. In neither case is it the reason you do or don't rank.

If a page you care about isn't showing up in search, the sitemap is a good first diagnostic (check submitted versus indexed in Search Console), but the cause is usually downstream: weak internal linking, thin content, a stray noindex, a canonical tag pointing the wrong way, or simply not enough authority for Google to bother. Fix the real constraint. The sitemap just makes sure the crawler knows the page exists.

Want someone to confirm your sitemap is clean and your important pages are getting indexed, not just submitted? That's exactly the kind of foundation work our technical SEO services handle, and an SEO audit is where we separate the real indexing problems from the cosmetic ones. We'll tell you straight what's broken and what's just noise. Email us at admin@moonsauceagency.com and you'll get an honest read on what's keeping your pages out of search, with no sales theater.

Keep reading: Crawl budget · Canonical tag · Technical SEO · Back to the glossary

Sources: Google Search Central: Sitemaps documentation · sitemaps.org protocol

Frequently asked

What is an XML sitemap used for?

An XML sitemap helps search engines discover the pages you want crawled and indexed. It lists your canonical URLs in a machine-readable file and can flag when each page last changed. It is most useful on large sites, new sites with few backlinks, or sites with weak internal linking, where Google might not find every page on its own. It is a discovery aid, not a guarantee of indexing or ranking.

Do I still need an XML sitemap in 2026?

Yes, but keep your expectations honest. For a small, well-linked site, Google will usually find your pages without one, so the sitemap is insurance rather than a lever. It earns its keep on large catalogs, news sites, sites with deep or orphaned pages, and brand-new domains. It is table-stakes hygiene, not a growth tactic. Submit one, keep it clean, then go spend your energy on content and links.

What should I include in my XML sitemap?

Only canonical, indexable URLs that return a 200 status and that you genuinely want in search. Leave out redirects, 404s, noindex pages, non-canonical duplicates, parameter junk, and pages blocked in robots.txt. A sitemap full of URLs you also tell Google to ignore sends mixed signals and wastes crawl attention. The rule is simple: every URL in the file should be a page you would be happy to see ranked.

What is the difference between an XML sitemap and an HTML sitemap?

An XML sitemap is written for search engine crawlers: a structured list of URLs with optional metadata, not meant for humans to read. An HTML sitemap is a page on your site that links to your important pages for visitors and, as a side effect, gives crawlers internal links to follow. They serve different audiences. Most sites need the XML version; an HTML sitemap is optional and only helps if your internal linking is weak.

How do I submit an XML sitemap to Google?

Add your sitemap URL to the Sitemaps report in Google Search Console, and reference it in your robots.txt with a Sitemap: line so any crawler can find it. Submitting tells Google where to look and lets you monitor how many URLs were discovered versus indexed. Submission does not force indexing. It speeds discovery and gives you a report to diagnose gaps, which is the real value.

Does an XML sitemap help with rankings?

Not directly. A sitemap affects discovery and crawling, not ranking. It can help a page get found and indexed faster, and a page that is never indexed cannot rank, so at scale a good sitemap removes a blocker. But it will not lift a page up the results. Rankings come from content quality, relevance, authority, and on-page signals. Treat the sitemap as plumbing, not a strategy.

How many URLs can an XML sitemap hold?

A single sitemap file is capped at 50,000 URLs and 50MB uncompressed. Larger sites split URLs across multiple sitemaps and tie them together with a sitemap index file, which itself can reference up to 50,000 sitemaps. Most platforms generate and split these automatically. If you are hand-maintaining one giant file, that is a sign your tooling, not your sitemap strategy, needs attention.

30 minutes. Let us see if we are a fit.

This is not a canned pitch. We want to hear about your business, your goals, and where you are stuck, then tell you honestly how we would help, or if we are not the right fit. You will talk to a founder, every time. Zero pressure, zero BS.

A founder on the call, never a sales rep

We learn your business before we pitch anything

A straight answer on whether we can help

Free30 minutesNo obligationA reply within a business day

Rob or Roger. The founders. Every time.