White-hat GEO
Getting cited by AI answer engines — ChatGPT, Perplexity, Google AI Overviews — the honest way, with real authority, extractable content, and no manipulation.
Updated 2026-06-20
Details
What this is
Optimizing to be cited by AI answer engines, not just ranked. ChatGPT, Perplexity, and Google’s AI Overviews don’t return ten blue links — they synthesize one answer from a handful of sources they decided to trust. GEO is the work of becoming one of those sources: content a model can extract cleanly, authority it can verify, and the crawler access it needs to read you at all. All of it inside the rules.
The “within the rules” part matters more than it used to. Google has folded manipulation of AI answers into its spam policies, so the cloaking-and-hidden-prompts playbook now carries real downside. We don’t sell that. We build the version that survives an algorithm update.
Why it’s worth doing now
This isn’t a bet on a niche channel. Gartner projects that by 2028, AI agents will influence 90% of B2B purchase journeys — over $15 trillion in spend routed, in part, through systems that read the web and decide what to surface. If a model can’t cite you, you’re not in that consideration set. The work to fix that takes months to compound, which is the argument for starting before your competitors do, not after.
The technical reality most teams miss
GEO gets sold as a content exercise. Half of it is plumbing, and the plumbing is where most sites quietly fail.
AI crawlers don’t run your JavaScript. Analysis of 500M+ GPTBot fetches found that AI retrieval crawlers don’t execute JavaScript — they read the raw HTML and move on. A client-rendered SPA serves them an empty shell. So the first thing we check is what the bot actually receives, not what renders in a browser. The fix is server-rendered or static HTML: a Next.js App Router site that ships real markup, or a content-heavy property rebuilt on Astro so every page is HTML on first byte. For our own stack we render through OpenNext and serve from Cloudflare Workers, which keeps the HTML edge-cached and fast for both users and bots.
The CDN may be blocking the bots you want. Bot-management rules and overzealous WAF defaults often drop GPTBot, ClaudeBot, and PerplexityBot before they reach the origin. The site looks fine to humans and is invisible to the engines. We audit the actual request logs, fix robots.txt, and unblock the named crawlers at the edge — on Cloudflare, that’s a few rules in front of the Worker or Pages deployment.
Structured data tells the model what it’s reading. We mark up entities with JSON-LD against the schema.org vocabulary — Organization, Product, FAQ, Article — so an engine resolving “who does X” has a machine-readable answer instead of a guess parsed from prose. It’s not magic ranking juice; it’s removing ambiguity.
Speed still counts. Slow pages get crawled less and rendered worse. We hold pages to the Core Web Vitals thresholds, which is mostly a consequence of the static-HTML approach above rather than separate work.
What actually earns the citation
The technical fixes get you readable. They don’t get you cited. For that there’s one tactic with peer-reviewed evidence behind it.
The original GEO study (KDD ‘24) tested which content changes move a page’s visibility inside AI-generated answers. Adding verifiable statistics, direct quotations, and citations to authoritative sources was the strongest lever — up to roughly a 40% lift in citation visibility, with “cite sources” the single most effective change. That’s the empirical backbone of what we do: we don’t make your content “more engaging,” we make it more citable — claims backed by numbers, named sources, and quotable, self-contained passages a model can lift without losing the meaning.
We also write answer-first. Each page leads with the claim, then the support, because that’s the shape an extractive model rewards. Buried conclusions don’t get quoted.
A note on llms.txt
We get asked about llms.txt a lot. Honest answer: the evidence isn’t there yet. A study across 300,000 domains found no measurable effect on AI citations, with adoption around 10%. No major answer engine has confirmed it reads the file. We’ll ship one because it’s cheap and harmless, but we won’t bill it as a growth lever or let it crowd out the things that actually move the needle.
Measurement, kept honest
You can’t optimize what you won’t measure, and you shouldn’t claim what you can’t measure. We instrument referral traffic from AI engines in GA4, track AI share-of-voice by sampling the prompts your buyers actually ask, and use Playwright to capture how the engines answer those prompts over time. Where attribution is genuinely impossible — and with AI answers a lot of it is — we say so instead of inventing a dashboard number.
One adjacent thing we check, because it bites teams who publish citable content fast: server-side authorization. When you open up pages and data for crawlers, it’s easy to leak something that should have been access-controlled. We review for broken access control before anything ships.
How we run it
GEO isn’t a one-off deliverable; the engines change monthly and last year’s tactics rot. We run it as a standardized, continuously-updated service: audit, fix the plumbing, build the citable content, instrument measurement, then re-audit as the engines shift — rebuilding the playbook from current research rather than folklore.
Sources
- GEO: Generative Engine Optimization (KDD ‘24) — peer-reviewed study; citing statistics and sources lifts AI citation visibility up to ~40% (2023, arXiv 2311.09735)
- Gartner: AI agents and B2B purchasing — 90% of B2B journeys influenced by AI agents by 2028, >$15T (2025-11)
- AI crawlers and JavaScript rendering — analysis of 500M+ GPTBot fetches: AI retrieval crawlers don’t execute JS (2025)
- llms.txt shows no clear effect on AI citations — 300k-domain study, ~10% adoption, no measurable lift (2025)
- Google spam policies cover generative AI responses — manipulating AI answers is treated as spam (2025)
FAQ
Questions, answered
What is white-hat about it?
We don't hide instructions for the model, cloak content to AI crawlers, or fake authority. Those get filtered or penalized — Google now treats manipulating AI answers as spam. We earn citations with real, verifiable, well-structured content.
Can you guarantee ChatGPT will recommend us?
No, and anyone who promises that is selling you a line. Citations aren't deterministic. We move the factors with real evidence behind them and track what is genuinely trackable.
AI search changes every month — how do you keep this from going stale?
We run GEO as a standardized, continuously-updated service, not a one-off. We re-audit as the engines shift and rebuild the playbook from current frontier research, not last year's tactics.