When someone asks ChatGPT a question and it searches the web, what happens to your page?
Not metaphorically. Mechanically. What does the system actually do with the HTML, the JavaScript, the structure you spent time building? How does it decide which parts matter and which parts to ignore?
There's a growing body of advice about optimizing content for large language models. Some of it is grounded in research. Some of it is extrapolated from research that says something slightly different from what's claimed. And some of it is speculation dressed as best practice, shared so confidently that it's hard to tell the difference.
This isn't a critique of the people doing that work. Figuring out how to be visible in AI-generated answers is a legitimate problem, and the people studying it are doing important early research in a field where the ground hasn't stopped shifting. I'm one of those people. At Backlinko, our editorial team researches and writes about this regularly.
But it's worth pausing to separate what we actually know from what we're still guessing about. The difference matters for how you invest your time.
This essay is an attempt at that separation.
Your page is not what they see
Here's the thing that reframes the entire conversation: when an AI system "reads" your page, it doesn't experience anything close to what your visitors see.
When a search-grounded LLM fetches your URL, it runs through a process known as Retrieval-Augmented Generation (RAG). The general architecture is well-established and documented across the industry, even though the specific implementation details vary by platform. Here's what that process looks like, step by step.
1. Fetch. The system requests your URL. It receives the raw HTML, exactly as your server delivers it. This is what arrives.
What arrives when the crawler requests your URL
Notice how little of this is your actual content. The navigation, stylesheets, scripts, sidebars, analytics code — they all consume tokens but carry no meaning for the model.
2. Strip. CSS, JavaScript, navigation, footers, sidebars — all removed. What remains is closer to a text transcript than a web page.
Andrej Karpathy, one of the founding members of OpenAI, put it directly: LLMs "initially strip the pages of all CSS, JS, etc., until there is just the text left." This applies to both how models are trained on web data and how they process pages at inference time.
What the model discards vs. what survives
In February 2026, Cloudflare quantified exactly how dramatic this is. They launched a feature called Markdown for Agents that converts HTML to clean markdown on the fly for AI clients. Their numbers: a typical blog post is 16,180 tokens as HTML and 3,150 tokens as markdown.
An 80% reduction.
Token count for the same blog post (Cloudflare, 2026)
3. Chunk. The surviving text gets split into discrete segments, typically 256 to 1,024 tokens each. Your article isn't processed as one continuous argument. It's fractured into overlapping windows.
A content strategy starts with understanding what your audience actually needs. Not what you want to say, but what they're trying to accomplish. The gap between those two things is where most strategies fail. They begin with the company's goals rather than the reader's problems...
Continuous text, split into discrete segments
The overlap between chunks ensures that ideas spanning a boundary aren't lost entirely. But it also means the model might see the same sentence in two different contexts, attached to different surrounding content.
4. Retrieve. A query determines which chunks enter the model's context window. Most of your article doesn't make the cut.
The query decides which fragments the model sees
The relevance scores are computed by comparing semantic similarity between the query and each chunk. A chunk about distribution strategy scores 0.22 against a question about audience research — useful content, but not for this question. Half your article never reaches the model.
5. Generate. The model synthesizes an answer from your selected chunks alongside chunks from other sources. Your content becomes one voice among several.
"...teams who interviewed their audience quarterly produced content that ranked 3.2x better..."
"...73% of marketers who exceed revenue goals conduct audience research monthly..."
Multiple sources, one synthesized answer
Your page contributed 3 of the 6 total chunks in this response. Whether you get cited with a link depends on the platform. Either way, only fragments of your original argument made it through. The model may use your data point while missing the context that made it meaningful.
Your page isn't read. It's sampled.
The implication is fundamental. You're not optimizing a web page for an AI system. You're optimizing a text extract of a web page. The design, the interactive elements, the carefully crafted layout — none of it exists in the version the model processes.
This doesn't mean design doesn't matter. It matters enormously for your human visitors, and they're still the vast majority of your traffic. But the mental model for AI visibility is different from the mental model for user experience.
The model sees your words and your structure. That's it.
What the research actually tells us
A few findings have held up across multiple studies and independent verification. These are the claims with the strongest evidence behind them.
The beginning and end get more attention than the middle
The most replicated finding about how LLMs handle long content comes from a Stanford and UC Berkeley paper called "Lost in the Middle" (2023). The researchers found that LLMs exhibit a U-shaped attention pattern: information at the beginning and end of a context window gets processed more reliably than information buried in the middle.
In 2025, MIT researchers traced this to architectural causes including Rotary Position Embedding (RoPE) and causal masking, which together create distance-based decay that systematically deprioritizes middle content.
The "Lost in the Middle" attention pattern (Stanford/UCB, 2023)
You might have encountered a related claim: "LLMs focus on the top 30% of the page." That traces to a Growth Memo study (February 2026) that analyzed 1.2 million ChatGPT responses and found 44.2% of citations came from the first third of source content. The study is ChatGPT-specific and not peer-reviewed, but it's the only large-scale analysis of where on a page AI citations actually come from.
Interestingly, the academic research supports the direction if not the exact framing. Beyond the U-shaped attention pattern, there are at least two additional mechanisms that favour early content. First, attention sinks: LLMs allocate disproportionately high attention weights to the first few tokens in a sequence, regardless of semantic relevance. This is an architectural feature, not a bug. Second, embedding retrieval bias: the dense retrieval models used in RAG systems to index and retrieve content systematically favour content appearing earlier in documents, with a measured performance drop of 34% when relevant information is positioned later.
Independent mechanisms that all favour early content
So the practical advice — put important content early — has stronger backing than just one study. Multiple independent mechanisms converge on the same conclusion. But the nuance matters: the middle of your content is where things disappear most reliably, and the end actually recovers some attention. It's a slope with a valley, not a cliff.
A related finding adds a layer. A 2025 paper called "Context Is What You Need" found that the Maximum Effective Context Window, the actual performance window, degrades well before the limits models advertise. Some models showed severe accuracy degradation at 1,000 tokens despite supporting 100,000+ token contexts.
Even if a model can process your entire page, it may not process it well.
Brand recognition predicts citation more than backlinks
A large-scale analysis of AI citation patterns found that brand search volume correlates with citation frequency at 0.392, while web mentions correlate at 0.664. By comparison, backlinks, the traditional backbone of SEO authority, correlated at just 0.218.
This inverts decades of SEO intuition.
But the caveat matters: this is correlation, not causation. Well-known brands naturally have more mentions, more citations, and more backlinks. The causal chain might run through brand strength rather than mention count. A startup that artificially inflates its web mentions wouldn't necessarily see more AI citations.
What the data does suggest is that the signals LLMs use to evaluate source authority aren't the same signals search engines use. That's worth knowing, even if we can't fully explain why yet.
ChatGPT doesn't cite what Google ranks
A Semrush study found that nearly 90% of ChatGPT's citations come from URLs ranking at position 21 or lower in Google search results. Less than 15% of pages overlap between Google's top results and ChatGPT's cited sources.
ChatGPT is also heavily biased toward certain source types. Wikipedia accounts for 47.9% of citations among ChatGPT's top 10 most-cited sources. Reddit and community forums appear disproportionately.
This is arguably the most practically significant finding in this space. If you're optimizing content for traditional search rankings and assuming that will translate to AI visibility, the data suggests otherwise, at least for ChatGPT specifically.
Where the evidence gets thinner
Other claims in the LLM optimization space have weaker or directly contradictory support. That doesn't mean they're wrong. It means we should hold them more lightly and watch for better data.
Schema markup: conflicting signals
The Princeton GEO study (2023) found that structured data and statistics improved citation rates in generative engine results, reporting lifts of 22% for statistics and 37% for quotations. This finding has been widely cited as evidence that schema markup improves LLM visibility.
But a SearchAtlas study examined LLM citation frequency across OpenAI, Gemini, and Perplexity and found zero correlation with schema markup coverage. Box plots across different schema adoption levels showed nearly identical visibility distributions.
A third study from Growth Marshal found the answer might be in implementation quality: rich, attribute-dense schema correlated with a 61.7% citation rate, while minimal or generic schema correlated with 41.6%.
What to make of this? Probably that schema existence alone doesn't move the needle. High-quality structured data might contribute as part of a broader signal mix. The honest answer is we don't have enough controlled studies to say with confidence.
llms.txt: adoption without evidence
The llms.txt proposal has a reasonable premise: give AI crawlers a curated map of your best content, similar to how robots.txt guides search engine crawlers. As of early 2026, over 844,000 sites have implemented it.
The evidence of impact is less encouraging. A Search Engine Land analysis found that 8 out of 10 sites saw no measurable change in traffic after implementation. No major AI platform, not OpenAI, not Google, not Anthropic, not Perplexity, has confirmed it consistently reads or uses the file.
That doesn't mean llms.txt is worthless. It might become valuable as platforms evolve. It's a low-cost implementation. But right now, the gap between adoption and evidence is wide enough to be honest about.
The "GEO" framework: promising but premature
Generative Engine Optimization as a discipline is real and important. The research emerging from Princeton and other institutions is doing genuine work to understand how AI systems select and cite sources.
The pushback comes from the pace at which early findings have been generalized into actionable advice. John Mueller, Google's Search Advocate, has been direct: "There is no such thing as GEO or AEO without doing SEO fundamentals." Technical communities have described some GEO advice as prematurely prescriptive for a field this young.
The core tension is this. Search engine optimization, for all its complexity, stabilized around a single dominant platform with relatively consistent rules. LLM optimization is trying to build a discipline around multiple platforms that work differently, change frequently, and share almost nothing about their internal ranking logic.
The foundations are being built. They're just not settled yet.
JS not rendered
Structure helps
llms.txt utility
How confident should we be?
These are not the same system
Here's where the picture gets more complicated. And more useful.
When people say "optimize for LLMs," they're treating these systems as a single category. But the major AI search platforms work differently in ways that matter for your content strategy.
One thing they do share: query fan-out. When you ask a question, the system doesn't send your exact words to a search index. It decomposes your query into multiple sub-queries, retrieves results for each, then synthesizes across them. Every major platform does this. The differences are in how far they take it.
ChatGPT Search
OpenAI operates two bots: GPTBot for training data and OAI-SearchBot for its search index. Its query fan-out rewrites the user's question into targeted sub-queries, sends those to search providers, then synthesizes results from its own index and partner data.
According to Vercel's analysis, GPTBot downloads JavaScript files about 11.5% of the time but does not execute them. It only sees the initial raw HTML. It follows robots.txt but does not respect canonical tags or meta noindex.
Its citation behavior diverges most from traditional search rankings. The heavy bias toward Wikipedia and Reddit, combined with the 90% citation rate from positions 21+, makes it the hardest platform to predict using traditional SEO signals.
Perplexity
Perplexity has the most distinctive architecture. According to independent analysis, it receives SERP results from Google's API, then programmatically visits the top 5 to 10 results and extracts their text. It doesn't use sitemaps. Nearly every query triggers a live fetch.
Perplexity also made headlines when Cloudflare reported that when blocked via robots.txt, Perplexity appeared to switch to undeclared crawlers mimicking Chrome on macOS. Perplexity disputed the attribution. The controversy remains unresolved.
Despite this, Perplexity's results correlate most closely with traditional search rankings. That makes sense if it's primarily drawing from Google's own results.
Google AI Overviews
Google's system is architecturally different from every other player because it doesn't need to crawl at query time. It already has the world's largest search index.
Google takes query fan-out further than anyone else. Because it already has the index, it can decompose a single query into 10 to 12 sub-queries that simultaneously hit the web index, Knowledge Graph, Shopping, News, and other data sources, all in parallel.
One query becomes 10 parallel sub-queries across Google's data sources
This is a meaningful difference. A page optimized for one specific keyword might get pulled into an AI Overview through a completely different sub-query path. The surface area for visibility is wider than traditional keyword targeting suggests.
Claude
Claude uses Brave Search as its search provider. Its web fetch tool can dynamically filter search results: Claude can write and execute code to select relevant content before it enters the context window. Rather than simple chunk retrieval, the model itself reasons about what to include.
There's also a curious anomaly. Data presented at Tech SEO Connect 2025 suggested ClaudeBot may render JavaScript in some cases, unlike every other major AI crawler. Anthropic hasn't confirmed this.
| ChatGPT | Perplexity | Google AI | Claude | |
|---|---|---|---|---|
| Index | Own + partners | Google API + live fetch | Full Google index | Brave Search |
| JS rendering | No | No | Yes (existing index) | Possibly (unconfirmed) |
| Crawl trigger | Background + query | Nearly every query | Pre-indexed | Per query |
| SERP correlation | Low (~10% overlap) | High | Moderate | Varies |
| Key bias | Wikipedia, Reddit | Google top results | Deepest fan-out | Brave index scope |
Four platforms, four different systems
The overlap problem
Here's the number that should give you pause: only 11% of domains appear in both ChatGPT and Perplexity citations.
These systems are not returning slight variations of the same answer. They're reaching genuinely different conclusions from different processes.
Domain citation overlap between platforms
Saying "optimize for LLMs" without specifying which one is like saying "optimize for social media" without specifying the platform. The advice might sound universal, but the mechanics aren't.
The JavaScript blind spot
This deserves its own section because it's the most technically actionable finding in this essay. And the most underappreciated.
No major AI crawler executes JavaScript. Vercel's analysis of over 500 million GPTBot requests found zero evidence of JavaScript execution. The same is true across other major AI crawlers, with the possible but unconfirmed exception of ClaudeBot.
If your content is rendered client-side, through React, Vue, Angular, or any other JavaScript framework that generates the page in the browser, AI crawlers see an empty shell.
Client-side rendered content is invisible to AI crawlers
This is the 2010 problem all over again. A decade ago, search engines couldn't reliably render JavaScript, and sites that relied on client-side rendering were invisible to Google. Google eventually solved this with its own rendering infrastructure. AI crawlers haven't.
The practical advice is straightforward. Ensure your important content is present in the initial HTML response, before any JavaScript execution. Server-side rendering or static generation solves this. If you're running a JavaScript-heavy site, view your page source, not the rendered DOM, but the actual HTML, and check whether your content is there.
If it's not, every AI system except possibly Google's (which benefits from its existing rendered index) is blind to it.
What this means for your content
The temptation after an essay like this is to end with a tactics list. "10 ways to optimize based on what we've learned." But given how much of this landscape is still uncertain, prescribing specific tactics would contradict the point of everything above.
Instead, here's how I'd think about it.
The fundamentals haven't changed, and that's not a platitude. Clear writing, genuine expertise, logical structure, specific claims supported by evidence. These are good for readers, AND they're exactly what survives the extraction pipeline. When a model strips your page to text and selects relevant chunks, the quality of your writing and the clarity of your structure is what remains.
There's no shortcut that substitutes for this.
Think in text, not in pages. Your content will be processed as a text extract, not as a designed experience. Headers serve as structural signals, not visual ones. Introductions matter disproportionately, not because of a "top 30%" rule, but because of how positional attention actually works in these models. Self-contained paragraphs that can stand alone as useful chunks perform better than ideas that depend on surrounding context to make sense.
Be honest about the platform question. If you're in content strategy, you need to decide how much to invest in platform-specific optimization versus universal quality. Right now, the universal quality bet is safer. It performs reasonably across all platforms and doesn't require rebuilding if one system changes its approach next quarter.
Watch the evidence, not the advice. This space is moving fast, and the advice is moving faster than the evidence supporting it. When you encounter a new LLM optimization recommendation, ask: what study is this based on? Has it been replicated? Does it apply across platforms or just one? The people doing the best work in this space, and there are many, are the ones citing specific research and acknowledging limitations.
Follow them.
What we're still figuring out
We're watching a new information architecture take shape. The way AI systems process, select, and cite web content is going to evolve significantly over the next few years. The models will get better at handling long contexts. The platforms will change how they crawl, index, and attribute sources. New research will confirm some of what we believe today and overturn the rest.
The honest position is intellectual humility paired with attention. Understand the mechanisms as best you can. Question claims that aren't sourced. Invest in the things that have always made content valuable, expertise, clarity, specificity, because those are the qualities that survive regardless of how the technology shifts.
This is a working document. As better research emerges, it will be updated.