What is noindex? It is a directive that tells search engines to keep a page out of their index, which means the page can be crawled and read but never shows up in search results. It is the cleanest, most precise way to say "this page exists, but I don't want it found in search." The mechanics are simple, the use cases are obvious once you see them, and the way it gets misused costs more rankings than almost any other small technical decision.
What is noindex, in plain English?
Search engines work in two phases that people constantly blur together. First they crawl, fetching the page and reading its content. Then they index, deciding to store that page so it can be served in results. Noindex speaks to the second phase. It lets the crawler in, lets it read everything, and then tells it: do not file this away, do not let it appear in results.
That distinction is the whole point. A page with noindex is fully visible to Googlebot. The bot has to see the page to see the instruction. What it cannot do is surface that page to a searcher. Compare that to a robots.txt block, which stops the crawler at the door before it reads anything. One controls what gets indexed; the other controls what gets crawled. Confusing the two is the single most common noindex mistake, and we'll come back to it because it matters more than any other thing on this page.
Typical pages that earn a noindex: internal site-search results, thank-you and confirmation pages, login and account screens, printer-friendly duplicates, thin tag archives, and the endless filter combinations that faceted navigation spawns. None of those deserve a slot in the index. Noindex is how you keep them out without deleting them.
How noindex works
You deliver the directive one of two ways, depending on the file type.
For HTML pages, you add a robots meta tag inside the document head:
<meta name="robots" content="noindex">For files that have no HTML head to put a tag in, like PDFs, images, or other documents, you set it in the HTTP response header instead:
X-Robots-Tag: noindexBoth do the same job. The X-Robots-Tag is the more flexible of the two because you can apply it server-side across whole file types or URL patterns without touching markup. You can also target a specific crawler by replacing the generic robots with a named user agent like googlebot, so one engine deindexes a page while another keeps it.
The other half of the value lives in the second token you pair with noindex. The default for any page is index, follow, so a bare noindex still implies follow: Google keeps crawling the page and following its outbound links. If you want it to stop doing that, you say noindex, nofollow. Here's how the common combinations behave:
| Directive | In index? | Links followed? | Use it for |
|---|---|---|---|
| index, follow | Yes | Yes | Normal pages (the default) |
| noindex, follow | No | Yes | A page you want out of results but whose links still matter |
| noindex, nofollow | No | No | Dead-end pages with nothing useful to pass on |
| index, nofollow | Yes | No | Rare; a page you want ranked but whose links you distrust |
One honest caveat on noindex, follow: Google has said that over a long horizon it treats persistently noindexed pages as effectively nofollow, because it crawls them less and less. So don't architect a site that depends on a noindexed page to keep passing link equity forever. It works in the short term, not as a permanent plumbing layer.
Why noindex matters
The lever here is index hygiene, and it is more strategic than it sounds. Search engines form an understanding of your site from the set of URLs they index. Flood that set with thin, duplicate, and utility pages and you blur the signal about what your site is genuinely about. Keep the index tight and the pages that should rank stand out more cleanly. This connects directly to technical SEO and to crawl budget: on large sites, every junk URL Google indexes is attention not spent on the pages that earn revenue.
There is a newer angle too. AI answer engines and the systems behind Google's AI Overviews draw from the same crawled and indexed corpus. Keeping low-value URLs out of the index also keeps them out of the source pool those tools sample from. Clean indexing is no longer a search-only concern; it shapes what machines cite about you.
But the reason noindex earns a place on every technical checklist is the downside. A noindex tag in the wrong place quietly removes pages from search with no error message, no broken layout, nothing a casual look would catch. The damage is invisible until traffic drops. That asymmetry, low upside when right and serious downside when wrong, is exactly why it deserves careful handling.
Noindex vs robots.txt, and the mistake to avoid
This is the part that trips up even experienced teams, so it gets its own section.
If you want a page gone from search, you must let Google crawl it so it can see the noindex directive. That means you do not also block it in robots.txt. The failure mode looks logical and is completely wrong: someone wants a page hidden, so they both noindex it and disallow it in robots.txt to be thorough. The disallow stops Google from fetching the page, which means Google never reads the noindex, which means the directive is invisible. Worse, a disallowed URL can still appear in results as a bare link with no title or description, because Google knows the URL exists from other links even though it can't see the content. The "extra protection" produces the opposite of what was intended.
The clean rules:
- To keep a page out of the index: use noindex and leave it crawlable.
- To stop a page from being crawled (for example to spare crawl budget on infinite parameter URLs): use robots.txt disallow, and accept that the URL might still appear as a bare link.
- To consolidate duplicates toward a preferred version while keeping the page accessible: reach for a canonical tag instead, which expresses a preference rather than a removal.
Picking the right tool comes down to intent. Noindex removes. Disallow blocks crawling. Canonical declares a favorite. They overlap in people's heads far more than in their behavior.
A few practical habits. Audit your noindex tags before and after every launch; the staging-tag-pushed-to-production accident is the most expensive mistake in this whole area. Use Search Console's URL Inspection tool to confirm Google sees the tag you think it sees. Remember that removal is not instant: Google has to recrawl the page to register the new directive, which can take days or weeks, and the Search Console removal tool only hides a URL temporarily while the noindex takes permanent hold. And never assume a noindexed page is safe from appearing if it is also blocked in robots.txt; that combination breaks the directive.
The bottom line
Noindex is a small, precise instrument with an outsized blast radius. Used on purpose, it keeps utility and duplicate pages out of search and out of the corpus that AI systems learn from, which sharpens how engines read the pages you do want ranked. The concept is easy. The discipline is in applying it to the right URLs and never pairing it with a robots.txt block that hides it from the very crawler it is meant to instruct.
If you take one thing away: noindex needs to be seen to work, so the page must stay crawlable. The day a misfired tag deindexes pages that should rank, you will wish someone had been watching the tags. Treat it as routine index hygiene, get the mechanics right, and move on to the levers that move the numbers.
Want a clean read on which of your pages should be indexed, which should carry a noindex, and which are quietly bleeding crawl budget? Our technical SEO team runs an SEO audit that maps every indexable URL and flags the misfires before they cost you traffic. Email us at admin@moonsauceagency.com and you'll get a prioritized list of indexing fixes with the why behind each one.
Keep reading: What is technical SEO? · Canonical tag · Crawl budget · Back to the glossary
Sources: Google Search Central: Block search indexing with noindex · Google Search Central: robots meta tag and X-Robots-Tag specifications