Why AI search can't read your SPA (GPTBot, ClaudeBot, and the JavaScript problem)

AI crawlers like GPTBot, ClaudeBot, and PerplexityBot mostly don't execute JavaScript, so a client-rendered SPA looks like a blank shell to them. Here is how to test your own site in two minutes, why server-side rendering or static HTML is the fix, and what it takes to get cited once the page is readable.

If your site is a client-rendered single-page app, AI answer engines probably see a blank page. The crawlers behind ChatGPT, Claude, and Perplexity fetch your HTML but generally don’t run the JavaScript that builds your content. No JavaScript, no content, no citation. The fix is to put your real text in the HTML the server returns — through server-side rendering or static generation — before any script runs. This post explains why the problem happens, how to confirm it on your own site in two minutes, exactly how to fix it, and why being readable is necessary but not enough to actually get cited.

The short version: AI crawlers fetch HTML, they don’t render it

A traditional browser does two things. It downloads your HTML, then it runs the JavaScript that fills the page in. Most AI retrieval crawlers stop after the first step. An analysis of over 500 million GPTBot fetches found that the major AI crawlers — GPTBot, ClaudeBot, PerplexityBot — request the raw HTML and largely skip JavaScript execution. They behave more like curl than like Chrome. Google’s own search indexer does render JS in a second pass, but that’s a separate system from the bots feeding live AI answers, and you can’t rely on Googlebot’s rendering to cover what GPTBot never runs.

So for a pure SPA, the bot sees what’s in the initial HTML response, and that’s usually almost nothing. A Create React App or default Vite build ships an index.html whose body is essentially one empty <div id="root"> plus a couple of script tags. Your headings, your product copy, your pricing tables, your FAQ — all of it is assembled in the browser after load, by code the bot never executes. From the model’s side, the page has no text worth quoting. Your content didn’t rank poorly — there was no content in the document for the bot to rank at all.

This is a different failure from a slow or flaky render. The crawler isn’t waiting for your app to boot and giving up. It’s reading the bytes of your HTTP response and moving on. If those bytes don’t contain your words, nothing downstream — no embedding, no retrieval, no citation — can recover them.

How to check your own site in two minutes

You don’t need special tooling to confirm this. Two checks tell you everything, and they disagree with each other in a way that’s worth understanding.

First, disable JavaScript and reload. In Chrome, open DevTools, open the command menu (Ctrl/Cmd+Shift+P), run “Disable JavaScript,” then refresh the page. If it goes blank, shows only a spinner, or collapses to a bare header, that broken page is roughly what an AI crawler gets. Browser automation tools like Playwright let you script the same check — load a route with JavaScript disabled and assert that a known sentence is present — so you can wire it into CI and catch regressions when someone moves a section back into a client component.

Second, look at the raw HTML the server actually sends, not the rendered DOM. Run curl -s https://yoursite.com/your-page and search the output for a sentence you know is in your main content:

curl -s https://yoursite.com/pricing | grep -o "starts at \$49"

If grep finds it, the bot can read it. If all you find is script tags and an empty mount point, your content lives only in JavaScript and AI crawlers can’t see it. The browser’s “View Source” shows the same thing if you’d rather not use a terminal — it’s the unrendered response, unlike the Elements panel.

That gap between the two views is the whole trap. The DOM in the Elements panel will always look complete, because by the time you open it the browser has already run your JS. Trust “View Source” and curl. Don’t trust the inspector, and don’t trust how the page looks to you — you have JavaScript turned on; the crawler doesn’t.

The fix: render on the server or ship static HTML

The fix is to put your real content in the HTML the server returns, before any JavaScript runs. There are two well-supported ways to do this, and they trade off the same way they always have.

Server-side rendering builds each page’s HTML on the server, or at the edge, for every request. The Next.js App Router renders Server Components to HTML by default — your content is in the response body, and the client only ships JavaScript for the interactive parts. You can run it at the edge with OpenNext on Cloudflare, which adapts a Next.js build to Cloudflare Workers so the same render that feeds users also feeds crawlers, close to wherever the request lands. The bot gets a fully populated page; your users still get an interactive app after hydration.

Static generation builds the HTML once at deploy time and serves the same file to everyone. Astro ships zero JavaScript by default and is a strong fit for content-heavy sites — marketing pages, docs, blog posts — where the content rarely changes between deploys. The output is plain HTML files you can serve from a CDN like Cloudflare Pages with no render step in the hot path. This is the approach we took for this site, which is why the text you’re reading right now is in the initial HTML response and not assembled by a script.

You don’t have to rewrite everything. The honest scoping question is which pages need to be cited, and the answer is almost always the public ones: marketing pages, documentation, articles, comparison pages. Move just those to SSR or static HTML and you cover most of the value. The logged-in application behind your auth wall can stay a client-rendered SPA — AI crawlers were never going to reach it, and shouldn’t, since gating private data behind authorization is the right call (OWASP A01: Broken Access Control). A common, sane architecture is a static or server-rendered public site plus a separate SPA dashboard, each doing what it’s good at.

After you migrate, run the curl check again. The text should be right there in the response body. Keep the page fast while you’re at it — the same edge rendering that helps crawlers also helps your Core Web Vitals, and a server that returns HTML quickly is one a crawler is more likely to fetch in full.

Being readable is necessary, not sufficient

Getting your content into the HTML gets you in the door. It does not get you cited. Once a model can read the page, what earns a citation is content it can extract cleanly and trust: a clear answer near the top, real specifics, and structured data that tells engines what the page is about instead of leaving them to infer it from prose.

The research backs this up with numbers. The generative engine optimization paper presented at KDD 2024 tested concrete content changes against real AI answer engines and found that adding cited statistics, direct quotations, and authoritative sources lifted a page’s visibility in generated answers by up to around 40%, with citing sources among the strongest single moves. The same study reported that the gains are not evenly distributed: pages that were already lower in conventional search rankings benefited the most, while top-ranked pages saw smaller or even negative changes — a sign that GEO is more a leveling force than a multiplier for content that already wins. The practical reading is to spend the effort where you’re currently ranking mid-pack rather than on pages that already lead.

Then make the page legible to engines as structured data, not just to humans as prose. Mark up your content with schema.org types — Article, FAQPage, Organization, Product, whatever fits — following Google’s structured data guidelines so engines can parse entities and relationships rather than guessing. Structured data is part of how a page gets recognized as a citable, attributable source rather than an undifferentiated wall of text.

A word of caution on shortcuts, because the readable-but-uncited gap tempts people into bad ideas. Skip the hidden-instructions tricks and the llms.txt hype. A study across 300,000 domains found no measurable effect from llms.txt on AI citations, with adoption around 10% and no major engine confirming it uses the file. And Google has updated its spam policies to cover manipulating generative AI responses, so cloaking content or stuffing prompts into pages is a path to getting filtered, not cited. Note the distinction underneath all of this: being crawled and indexed is not the same as being quoted. A model can ingest your page and still never surface it, which is why extractability and trustworthiness do the real work once access is solved.

Why this is worth doing now

AI-mediated discovery is moving from novelty to default. Gartner projects that by 2028, 90% of B2B buying journeys will be influenced by AI agents, touching more than $15 trillion in purchases. If those agents fetch your raw HTML and find an empty <div>, you are absent from the answer regardless of how good your product is — and you won’t see it in your analytics as a ranking drop, because there’s no impression to lose. You’re simply not in the candidate set.

The check costs two minutes. Disable JavaScript, curl your top pages, and see whether your content survives. If it doesn’t, server-side rendering or static HTML is the fix, and it’s a one-time investment that keeps paying off as more buying decisions route through models that can’t run your scripts. Get the content into the HTML first; then earn the citation with specifics, sources, and structure.