What Is Noindex? How the Directive Works

What is noindex? It is a directive that tells search engines to keep a page out of their index, which means the page can be crawled and read but never shows up in search results. It is the cleanest, most precise way to say "this page exists, but I don't want it found in search." The mechanics are simple, the use cases are obvious once you see them, and the way it gets misused costs more rankings than almost any other small technical decision.

What is noindex, in plain English?

Search engines work in two phases that people constantly blur together. First they crawl, fetching the page and reading its content. Then they index, deciding to store that page so it can be served in results. Noindex speaks to the second phase. It lets the crawler in, lets it read everything, and then tells it: do not file this away, do not let it appear in results.

That distinction is the whole point. A page with noindex is fully visible to Googlebot. The bot has to see the page to see the instruction. What it cannot do is surface that page to a searcher. Compare that to a robots.txt block, which stops the crawler at the door before it reads anything. One controls what gets indexed; the other controls what gets crawled. Confusing the two is the single most common noindex mistake, and we'll come back to it because it matters more than any other thing on this page.

Typical pages that earn a noindex: internal site-search results, thank-you and confirmation pages, login and account screens, printer-friendly duplicates, thin tag archives, and the endless filter combinations that faceted navigation spawns. None of those deserve a slot in the index. Noindex is how you keep them out without deleting them.

How noindex works

You deliver the directive one of two ways, depending on the file type.

For HTML pages, you add a robots meta tag inside the document head:

<meta name="robots" content="noindex">

For files that have no HTML head to put a tag in, like PDFs, images, or other documents, you set it in the HTTP response header instead:

X-Robots-Tag: noindex

Both do the same job. The X-Robots-Tag is the more flexible of the two because you can apply it server-side across whole file types or URL patterns without touching markup. You can also target a specific crawler by replacing the generic robots with a named user agent like googlebot, so one engine deindexes a page while another keeps it.

The other half of the value lives in the second token you pair with noindex. The default for any page is index, follow, so a bare noindex still implies follow: Google keeps crawling the page and following its outbound links. If you want it to stop doing that, you say noindex, nofollow. Here's how the common combinations behave:

Directive	In index?	Links followed?	Use it for
index, follow	Yes	Yes	Normal pages (the default)
noindex, follow	No	Yes	A page you want out of results but whose links still matter
noindex, nofollow	No	No	Dead-end pages with nothing useful to pass on
index, nofollow	Yes	No	Rare; a page you want ranked but whose links you distrust

One honest caveat on noindex, follow: Google has said that over a long horizon it treats persistently noindexed pages as effectively nofollow, because it crawls them less and less. So don't architect a site that depends on a noindexed page to keep passing link equity forever. It works in the short term, not as a permanent plumbing layer.

Why noindex matters

The lever here is index hygiene, and it is more strategic than it sounds. Search engines form an understanding of your site from the set of URLs they index. Flood that set with thin, duplicate, and utility pages and you blur the signal about what your site is genuinely about. Keep the index tight and the pages that should rank stand out more cleanly. This connects directly to technical SEO and to crawl budget: on large sites, every junk URL Google indexes is attention not spent on the pages that earn revenue.

There is a newer angle too. AI answer engines and the systems behind Google's AI Overviews draw from the same crawled and indexed corpus. Keeping low-value URLs out of the index also keeps them out of the source pool those tools sample from. Clean indexing is no longer a search-only concern; it shapes what machines cite about you.

But the reason noindex earns a place on every technical checklist is the downside. A noindex tag in the wrong place quietly removes pages from search with no error message, no broken layout, nothing a casual look would catch. The damage is invisible until traffic drops. That asymmetry, low upside when right and serious downside when wrong, is exactly why it deserves careful handling.

Noindex vs robots.txt, and the mistake to avoid

This is the part that trips up even experienced teams, so it gets its own section.

If you want a page gone from search, you must let Google crawl it so it can see the noindex directive. That means you do not also block it in robots.txt. The failure mode looks logical and is completely wrong: someone wants a page hidden, so they both noindex it and disallow it in robots.txt to be thorough. The disallow stops Google from fetching the page, which means Google never reads the noindex, which means the directive is invisible. Worse, a disallowed URL can still appear in results as a bare link with no title or description, because Google knows the URL exists from other links even though it can't see the content. The "extra protection" produces the opposite of what was intended.

The clean rules:

To keep a page out of the index: use noindex and leave it crawlable.
To stop a page from being crawled (for example to spare crawl budget on infinite parameter URLs): use robots.txt disallow, and accept that the URL might still appear as a bare link.
To consolidate duplicates toward a preferred version while keeping the page accessible: reach for a canonical tag instead, which expresses a preference rather than a removal.

Picking the right tool comes down to intent. Noindex removes. Disallow blocks crawling. Canonical declares a favorite. They overlap in people's heads far more than in their behavior.

A few practical habits. Audit your noindex tags before and after every launch; the staging-tag-pushed-to-production accident is the most expensive mistake in this whole area. Use Search Console's URL Inspection tool to confirm Google sees the tag you think it sees. Remember that removal is not instant: Google has to recrawl the page to register the new directive, which can take days or weeks, and the Search Console removal tool only hides a URL temporarily while the noindex takes permanent hold. And never assume a noindexed page is safe from appearing if it is also blocked in robots.txt; that combination breaks the directive.

The bottom line

Noindex is a small, precise instrument with an outsized blast radius. Used on purpose, it keeps utility and duplicate pages out of search and out of the corpus that AI systems learn from, which sharpens how engines read the pages you do want ranked. The concept is easy. The discipline is in applying it to the right URLs and never pairing it with a robots.txt block that hides it from the very crawler it is meant to instruct.

If you take one thing away: noindex needs to be seen to work, so the page must stay crawlable. The day a misfired tag deindexes pages that should rank, you will wish someone had been watching the tags. Treat it as routine index hygiene, get the mechanics right, and move on to the levers that move the numbers.

Want a clean read on which of your pages should be indexed, which should carry a noindex, and which are quietly bleeding crawl budget? Our technical SEO team runs an SEO audit that maps every indexable URL and flags the misfires before they cost you traffic. Email us at admin@moonsauceagency.com and you'll get a prioritized list of indexing fixes with the why behind each one.

Keep reading: What is technical SEO? · Canonical tag · Crawl budget · Back to the glossary

Sources: Google Search Central: Block search indexing with noindex · Google Search Central: robots meta tag and X-Robots-Tag specifications

Frequently asked

What is the difference between noindex and robots.txt disallow?

They solve different problems and are not interchangeable. Robots.txt disallow blocks crawling: Google won't fetch the page. Noindex allows crawling but tells Google to keep the page out of the index. The trap is combining them. If you disallow a page in robots.txt, Google can't crawl it, so it never sees the noindex tag and the directive is ignored. A disallowed URL can still appear in results as a bare link with no description. To reliably remove a page, use noindex and leave it crawlable.

How do I add a noindex tag?

Two ways. For HTML pages, add a robots meta tag in the head: <meta name="robots" content="noindex">. For non-HTML files like PDFs or images, you can't add a meta tag, so use an X-Robots-Tag in the HTTP response header instead: X-Robots-Tag: noindex. Both do the same job. You can target a specific crawler by swapping "robots" for "googlebot". Pair noindex with follow if you still want Google to crawl the page's outbound links.

Does noindex remove a page from Google immediately?

No. Google has to recrawl the page to see the new directive, and that can take days or weeks depending on how often it visits the URL. The page stays in the index until then. If you need a faster removal, the URL removal tool in Search Console hides a page from results for about six months while the noindex takes permanent effect. Submitting the URL for recrawl in Search Console can also speed things up.

Will a noindexed page pass link equity?

It depends on the second directive you pair with noindex. The default robots value is index,follow, so a bare noindex still implies follow, meaning Google crawls and follows the links on the page. If you set noindex,nofollow, Google eventually stops following those links. Google has noted that over time it tends to treat long-term noindexed pages as effectively nofollow because it crawls them less, so don't rely on a noindexed page to pass equity indefinitely.

Should I noindex thin or duplicate pages?

Sometimes, but a canonical tag is often the better tool for duplicates. Use noindex for pages that have no business in search at all: internal search results, thank-you pages, login screens, faceted filter combinations, staging URLs. For near-duplicate content where one version should rank, a canonical tag consolidates signals to the preferred URL instead of removing the page entirely. Noindex is a removal; canonical is a preference. Reach for the one that matches your intent.

Can noindex hurt my SEO?

Only if you apply it to the wrong pages. The classic disaster is a noindex tag left on a staging site that gets pushed to production, quietly deindexing pages that should rank. Another is a template-level tag that hits a whole section. Used on purpose, noindex helps: it keeps low-value URLs out of the index, which can sharpen how search engines understand your important pages. The risk is operational, not strategic, so audit your tags before and after any launch.

Does noindex still matter in 2026?

Yes. It's a core, stable part of how the open web tells search engines what to index, and Google still documents and honors it. As AI answer engines pull from the same crawled corpus, keeping junk URLs out of the index also keeps them out of the source pool those systems draw from. Noindex isn't glamorous, but it's table-stakes index hygiene, and a misfired tag can do real damage. Worth getting right, not worth overthinking.

30 minutes. Let us see if we are a fit.

This is not a canned pitch. We want to hear about your business, your goals, and where you are stuck, then tell you honestly how we would help, or if we are not the right fit. You will talk to a founder, every time. Zero pressure, zero BS.

A founder on the call, never a sales rep

We learn your business before we pitch anything

A straight answer on whether we can help

Free30 minutesNo obligationA reply within a business day

Rob or Roger. The founders. Every time.