Crawl budget is how many pages a search engine will crawl on your site within a given window. It's set by two things: how fast your server can respond without falling over (crawl rate), and how much the engine wants your URLs (crawl demand). For most sites, it's a non-issue. Google crawls sites under a few thousand URLs efficiently on its own. If you have tens of thousands of pages, it starts to matter.
The honest version: most people obsessing over this are wasting their time
Crawl budget is one of the most over-discussed, under-relevant topics in technical SEO. Half the "crawl budget optimization" advice on the internet is solving a problem the reader doesn't have, dressed up as urgent so you'll keep reading.
Here's Google's actual position, straight from their own docs: if your site has fewer than a few thousand URLs, it gets crawled efficiently and you don't need to think about this. Their dedicated crawl budget guide is explicitly written for "large sites" (think e-commerce catalogs, news publishers, and database-driven sites in the hundreds of thousands or millions of URLs). If you run a 40-page service site, a small blog, or a lean marketing site, crawl budget is not why a page isn't ranking. Something else is, and it's usually content depth, internal linking, or authority.
So when does it matter? Roughly when you cross into tens of thousands of URLs, especially if those pages change often or are generated dynamically: faceted navigation, infinite parameter combinations, paginated archives, auto-generated tag pages. At that scale, Googlebot can't and won't crawl everything every day, and you start making real choices about where its attention goes. Below that scale, the lever you're reaching for isn't even connected to anything.
What is crawl budget made of?
Google breaks crawl budget into two components. Understanding both is the difference between fixing a real problem and chasing a ghost, because each one responds to a completely different fix.
Crawl rate (crawl capacity limit)
This is the supply side: how many simultaneous connections Googlebot will open to your site, and how long it waits between fetches, without degrading your server. Googlebot watches your response times and error rates as it crawls. When your server answers fast and clean, it opens more parallel connections and shortens the gap between requests. When response times climb or you start throwing 5xx errors and timeouts, it backs off automatically to avoid taking your site down. In plain terms: fast hosting and clean response codes literally buy you more crawl capacity, and a slow or flaky server quietly caps it. This is the same server-health work that shows up in Core Web Vitals, so you tend to fix both at once.
Crawl demand (crawl scheduling)
This is the demand side: how much Google wants your URLs. Demand is driven by three things. Popularity: URLs with more internal and external links, and more traffic, get crawled more often. Staleness: Google recrawls pages it believes have changed and lets static, never-touched pages drift to the back of the queue. Perceived quality: thin, duplicate, or low-value pages earn fewer visits over time. A page nobody links to, that never changes, and that looks thin gets crawled rarely. That's not a budget you can buy your way out of with a server upgrade. You earn it with authority and genuine freshness, not by editing a timestamp.
Your effective crawl budget is wherever those two lines meet. Capacity caps how much Google can crawl. Demand caps how much it wants to. The lower number wins, every time. That's why throwing money at faster hosting does nothing if the real constraint is that nobody links to your pages, and vice versa.
How to tell if you have a crawl budget problem
You have a genuine crawl budget problem only if both of these are true: you have a large site (tens of thousands of URLs or more), and important pages are showing up as "Discovered, currently not indexed" or "Crawled, currently not indexed" in Google Search Console while sitting un-crawled for weeks.
The Crawl Stats report is where you confirm it (Search Console, Settings, then Crawl stats). Three numbers tell the story:
- Total crawl requests over time. A flat or declining line on a growing site can mean Google has lost interest or your server is throttling it. A volatile line that spikes and crashes often points to server health, not strategy.
- Breakdown by response code. A healthy site is mostly 200s. If a large share of requests are coming back as redirects, 404s, or 5xx errors, Google is burning your budget on dead ends instead of your real pages.
- Crawl purpose (discovery vs. refresh) and host status. Heavy refresh crawling of low-value URLs, or a host status that flags availability problems, tells you where the waste is.
Then sanity-check the outcome: how long after publishing does a new page get crawled? Same-day or next-day crawling means you are fine, full stop, no matter how many scary blog posts say otherwise. Weeks of delay on a large site is the real signal. On a small site, even slow crawling almost always traces back to weak internal linking or low authority, not budget, so don't fix the wrong thing.
How to preserve crawl budget when the problem is real
When the problem is genuine, the fix is almost always reduce the waste, not beg for more budget. You can't directly increase Google's allocation. What you can do is stop it from being spent on garbage, so a larger share of the same budget lands on pages that earn money.
- Block low-value URLs from being crawled. Use
robots.txtto disallow infinite-combination URLs: internal search results, faceted-navigation parameter sprawl, session IDs, sort and filter permutations. On large sites these are usually the single biggest crawl drain, and they multiply silently. - Kill redirect chains. Every hop in a chain (A to B to C) burns a separate crawl request and slows discovery of the real destination. Point every redirect straight to the final URL.
- Fix duplicate and near-duplicate URLs. Consolidate with a canonical tag so Google spends its budget on one version, not five copies of the same page.
- Return clean status codes. Remove soft 404s, fix server errors, and let dead pages return a real 404 or 410 so Google stops circling back to recrawl ghosts.
- Keep your sitemap honest. Submit only canonical, indexable URLs with accurate
lastmoddates. The sitemap is a signal of where to look and what genuinely changed, so don't pollute it with redirects, noindex pages, or fake timestamps. - Improve site speed and server health. A faster, more reliable server raises your crawl rate ceiling directly, which is one of the few inputs you fully control.
- Prune thin and dead content. Fewer low-value pages means more of Google's attention lands on the ones you care about, and it lifts perceived quality across the site.
None of this is a separate discipline. It's core technical SEO hygiene, and the same cleanup that preserves crawl budget also makes your whole site easier to index and rank. If you'd rather not run the audit yourself, that's exactly what our technical SEO services are for.
The 2026 wrinkle: AI crawlers want a slice too
Here's the part most "crawl budget" articles still haven't caught up on. Googlebot is no longer the only crawler that matters. GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, and Google-Extended are all hitting your site now, each with its own crawl behavior, its own demand signals, and its own appetite for hammering your server. Some of them are far less polite about it than Googlebot.
This cuts two ways. First, the same server-health and clean-URL hygiene that helps Googlebot helps AI crawlers reach the content you want cited in ChatGPT and Google AI Overviews. Second, you now have to manage AI crawler access deliberately: which bots you allow, what you expose, and how you point them at your best material. An llms.txt file is one emerging way to flag the content you most want surfaced in answers. Blocking these bots outright is a legitimate choice, but it's a choice with real consequences for AI visibility, so make it on purpose, not by accident in a robots.txt rule you forgot you wrote.
Old playbook: optimize crawl budget for Google. New playbook: keep your site clean and fast for every crawler that now decides whether you show up, in blue links and in AI answers both. If getting cited in those answers is the actual goal, that's a strategy in its own right, which is what we cover in our take on how to get cited in Google AI Overviews.
Stop optimizing things that don't change outcomes
If you came here worried about crawl budget, here's the no-fluff takeaway: for most sites, it's a rabbit hole. The real wins live in content depth, technical hygiene, authority, and now showing up in AI answers, not in shaving crawl requests off a 50-page site.
We do SEO that fixes the things that matter, classic Google rankings and AI-answer visibility both, and we'll tell you straight when a "problem" isn't one. No jargon walls, no busywork dressed up as strategy. If you genuinely are at large-site scale, a focused SEO audit is where we separate the real crawl waste from the noise.
Want a real read on what's holding your site back? See how we approach SEO, or email admin@moonsauceagency.com and we'll give it to you straight. An honest read, no sales theater.
Browse more terms in the MoonSauce glossary.