What LLMs Actually See When They Read Your Page

When someone asks ChatGPT a question and it searches the web, what happens to your page?

Not metaphorically. Mechanically. What does the system actually do with the HTML, the JavaScript, the structure you spent time building? How does it decide which parts matter and which parts to ignore?

There's a growing body of advice about optimizing content for large language models. Some of it is grounded in research. Some of it is extrapolated from research that says something slightly different from what's claimed. And some of it is speculation dressed as best practice, shared so confidently that it's hard to tell the difference.

This isn't a critique of the people doing that work. Figuring out how to be visible in AI-generated answers is a legitimate problem, and the people studying it are doing important early research in a field where the ground hasn't stopped shifting. I'm one of those people. At Backlinko, our editorial team researches and writes about this regularly.

But it's worth pausing to separate what we actually know from what we're still guessing about. The difference matters for how you invest your time.

This essay is an attempt at that separation.

Your page is not what they see

Here's the thing that reframes the entire conversation: when an AI system "reads" your page, it doesn't experience anything close to what your visitors see.

When a search-grounded LLM fetches your URL, it runs through a process known as Retrieval-Augmented Generation (RAG). The general architecture is well-established and documented across the industry, even though the specific implementation details vary by platform. Here's what that process looks like, step by step.

1. Fetch. The system requests your URL. It receives the raw HTML, exactly as your server delivers it. This is what arrives.

200 OK yoursite.com/content-strategy 16,180 tokens

<!DOCTYPE html>

<head>

</head>

<body>

<h1>How to Build a Content Strategy</h1>

<p>A content strategy starts with understanding

what your audience actually needs...</p>

<h2>The three pillars</h2>

<p>Every effective strategy rests on audience

research, competitive analysis, and...</p>

</article>

</body></html>

■ Your content ■ Everything else

What arrives when the crawler requests your URL

Notice how little of this is your actual content. The navigation, stylesheets, scripts, sidebars, analytics code — they all consume tokens but carry no meaning for the model.

A note on tokens

A token is roughly three-quarters of a word. A 2,000-word article is about 2,700 tokens. Models have a finite budget of tokens they can process at once — their "context window" — and every token costs compute to process. This is why the pipeline exists: to get your content small enough to fit, cheaply enough to be worth it. When we talk about token counts throughout this essay, this is the currency.

2. Strip. CSS, JavaScript, navigation, footers, sidebars — all removed. What remains is closer to a text transcript than a web page.

Andrej Karpathy, one of the founding members of OpenAI, put it directly: LLMs "initially strip the pages of all CSS, JS, etc., until there is just the text left." This applies to both how models are trained on web data and how they process pages at inference time.

Raw HTML

<!DOCTYPE html>

<nav>Home | Blog | About</nav>

<h1>How to Build a Content Strategy</h1>

<p>A content strategy starts with understanding what your audience actually needs.</p>

<h2>The three pillars</h2>

<p>Every effective strategy rests on audience research, competitive analysis, and...</p>

Text extracted

How to Build a Content Strategy

A content strategy starts with understanding what your audience actually needs.

The three pillars

Every effective strategy rests on audience research, competitive analysis, and...

Removed: ~13,000 tokens Kept: ~3,150 tokens

What the model discards vs. what survives

In February 2026, Cloudflare quantified exactly how dramatic this is. They launched a feature called Markdown for Agents that converts HTML to clean markdown on the fly for AI clients. Their numbers: a typical blog post is 16,180 tokens as HTML and 3,150 tokens as markdown.

An 80% reduction.

16,180

Raw HTML

3,150

Markdown

80% of your page is invisible to the model

Token count for the same blog post (Cloudflare, 2026)

3. Chunk. The surviving text gets split into discrete segments, typically 256 to 1,024 tokens each. Your article isn't processed as one continuous argument. It's fractured into overlapping windows.

Chunk 1 · 487 tokens How to Build a Content Strategy
A content strategy starts with understanding what your audience actually needs. Not what you want to say, but what they're trying to accomplish. The gap between those two things is where most strategies fail. They begin with the company's goals rather than the reader's problems...

··· 64 token overlap ···

Chunk 2 · 512 tokens ...rather than the reader's problems. The three pillars. Every effective strategy rests on audience research, competitive analysis, and a distribution plan that matches your resources. Most teams skip the third one. They create great content and wonder why nobody reads it...

··· 64 token overlap ···

Chunk 3 · 498 tokens Audience research that goes beyond demographics. Surveys tell you what people say they want. Search data tells you what they actually look for. The tension between those signals is where genuine insights live. We ran a study across 50 clients...

··· 64 token overlap ···

Chunk 4 · 521 tokens ...We ran a study across 50 clients and found that teams who interviewed their audience quarterly produced content that ranked 3.2x better. The interview doesn't replace SEO — it sharpens it. You learn the language your audience uses, which is often different from the language your industry uses...

··· 64 token overlap ···

Chunk 5 · 476 tokens Competitive analysis that matters. Don't just list what competitors publish. Identify what they're not saying. The gaps are where your best opportunities live. When Angi rebuilt their content strategy, Caroline Gilbert focused on questions that existing content answered poorly...

··· 64 token overlap ···

Chunk 6 · 391 tokens ...This approach generated 40% more organic traffic per article. Making distribution a pillar, not an afterthought. The most effective teams allocate 50% of their content budget to distribution. That sounds radical until you calculate the cost of content nobody reads...

6 chunks from ~3,150 tokens 64 token overlap between chunks

Continuous text, split into discrete segments

The overlap between chunks ensures that ideas spanning a boundary aren't lost entirely. But it also means the model might see the same sentence in two different contexts, attached to different surrounding content.

4. Retrieve. A query determines which chunks enter the model's context window. Most of your article doesn't make the cut.

User's query

"how important is audience research for content strategy?"

Chunk 1 A content strategy starts with understanding what your audience actually needs... 0.92

Chunk 2 Every effective strategy rests on audience research, competitive analysis, and a distribution plan... 0.61

Chunk 3 Surveys tell you what people say they want. Search data tells you what they actually look for... 0.89

Chunk 4 Teams who interviewed their audience quarterly produced content that ranked 3.2x better than those... 0.84

Chunk 5 Don't just list what competitors publish. Identify what they're not saying. The gaps are where... 0.38

Chunk 6 The most effective teams allocate 50% of their content budget to distribution... 0.22

3 chunks selected · 3 chunks discarded

The query decides which fragments the model sees

The relevance scores are computed by comparing semantic similarity between the query and each chunk. A chunk about distribution strategy scores 0.22 against a question about audience research — useful content, but not for this question. Half your article never reaches the model.

5. Generate. The model synthesizes an answer from your selected chunks alongside chunks from other sources. Your content becomes one voice among several.

Your page · chunks 1, 3, 4

"...starts with understanding what your audience actually needs..."

"...teams who interviewed their audience quarterly produced content that ranked 3.2x better..."

hubspot.com · 2 chunks

"...audience research involves creating detailed buyer personas based on real data..."

"...73% of marketers who exceed revenue goals conduct audience research monthly..."

semrush.com · 1 chunk

"...the most common audience research methods include surveys, interviews, and analytics review..."

▾ synthesized into

Model's response

Audience research is foundational to content strategy. Rather than starting with what your company wants to say, effective strategies begin with what the audience needs. Research shows that teams conducting quarterly audience interviews see 3.2x better content performance, and 73% of marketers exceeding revenue goals conduct audience research monthly. Key methods include surveys, interviews, and analytics review, though the most valuable insight often comes from the gap between what people say they want and what they actually search for.

Multiple sources, one synthesized answer

Your page contributed 3 of the 6 total chunks in this response. Whether you get cited with a link depends on the platform. Either way, only fragments of your original argument made it through. The model may use your data point while missing the context that made it meaningful.

Your page isn't read. It's sampled.

The implication is fundamental. You're not optimizing a web page for an AI system. You're optimizing a text extract of a web page. The design, the interactive elements, the carefully crafted layout — none of it exists in the version the model processes.

This doesn't mean design doesn't matter. It matters enormously for your human visitors, and they're still the vast majority of your traffic. But the mental model for AI visibility is different from the mental model for user experience.

The model sees your words and your structure. That's it.

What the research actually tells us

A few findings have held up across multiple studies and independent verification. These are the claims with the strongest evidence behind them.

The beginning and end get more attention than the middle

The most replicated finding about how LLMs handle long content comes from a Stanford and UC Berkeley paper called "Lost in the Middle" (2023). The researchers found that LLMs exhibit a U-shaped attention pattern: information at the beginning and end of a context window gets processed more reliably than information buried in the middle.

In 2025, MIT researchers traced this to architectural causes including Rotary Position Embedding (RoPE) and causal masking, which together create distance-based decay that systematically deprioritizes middle content.

Model attention ↑

Beginning Middle End

The "Lost in the Middle" attention pattern (Stanford/UCB, 2023)

You might have encountered a related claim: "LLMs focus on the top 30% of the page." That traces to a Growth Memo study (February 2026) that analyzed 1.2 million ChatGPT responses and found 44.2% of citations came from the first third of source content. The study is ChatGPT-specific and not peer-reviewed, but it's the only large-scale analysis of where on a page AI citations actually come from.

Interestingly, the academic research supports the direction if not the exact framing. Beyond the U-shaped attention pattern, there are at least two additional mechanisms that favour early content. First, attention sinks: LLMs allocate disproportionately high attention weights to the first few tokens in a sequence, regardless of semantic relevance. This is an architectural feature, not a bug. Second, embedding retrieval bias: the dense retrieval models used in RAG systems to index and retrieve content systematically favour content appearing earlier in documents, with a measured performance drop of 34% when relevant information is positioned later.

U-shaped attention

Middle decays

Attention sinks

First tokens dominate

Retrieval bias

Early content preferred

▾

Three mechanisms, same conclusion: put important content early

Independent mechanisms that all favour early content

So the practical advice — put important content early — has stronger backing than just one study. Multiple independent mechanisms converge on the same conclusion. But the nuance matters: the middle of your content is where things disappear most reliably, and the end actually recovers some attention. It's a slope with a valley, not a cliff.

A related finding adds a layer. A 2025 paper called "Context Is What You Need" found that the Maximum Effective Context Window, the actual performance window, degrades well before the limits models advertise. Some models showed severe accuracy degradation at 1,000 tokens despite supporting 100,000+ token contexts.

Even if a model can process your entire page, it may not process it well.

Brand recognition predicts citation more than backlinks

A large-scale analysis of AI citation patterns found that brand search volume correlates with citation frequency at 0.392, while web mentions correlate at 0.664. By comparison, backlinks, the traditional backbone of SEO authority, correlated at just 0.218.

This inverts decades of SEO intuition.

But the caveat matters: this is correlation, not causation. Well-known brands naturally have more mentions, more citations, and more backlinks. The causal chain might run through brand strength rather than mention count. A startup that artificially inflates its web mentions wouldn't necessarily see more AI citations.

What the data does suggest is that the signals LLMs use to evaluate source authority aren't the same signals search engines use. That's worth knowing, even if we can't fully explain why yet.

ChatGPT doesn't cite what Google ranks

A Semrush study found that nearly 90% of ChatGPT's citations come from URLs ranking at position 21 or lower in Google search results. Less than 15% of pages overlap between Google's top results and ChatGPT's cited sources.

ChatGPT is also heavily biased toward certain source types. Wikipedia accounts for 47.9% of citations among ChatGPT's top 10 most-cited sources. Reddit and community forums appear disproportionately.

This is arguably the most practically significant finding in this space. If you're optimizing content for traditional search rankings and assuming that will translate to AI visibility, the data suggests otherwise, at least for ChatGPT specifically.

Where the evidence gets thinner

Other claims in the LLM optimization space have weaker or directly contradictory support. That doesn't mean they're wrong. It means we should hold them more lightly and watch for better data.

Schema markup: conflicting signals

The Princeton GEO study (2023) found that structured data and statistics improved citation rates in generative engine results, reporting lifts of 22% for statistics and 37% for quotations. This finding has been widely cited as evidence that schema markup improves LLM visibility.

But a SearchAtlas study examined LLM citation frequency across OpenAI, Gemini, and Perplexity and found zero correlation with schema markup coverage. Box plots across different schema adoption levels showed nearly identical visibility distributions.

A third study from Growth Marshal found the answer might be in implementation quality: rich, attribute-dense schema correlated with a 61.7% citation rate, while minimal or generic schema correlated with 41.6%.

What to make of this? Probably that schema existence alone doesn't move the needle. High-quality structured data might contribute as part of a broader signal mix. The honest answer is we don't have enough controlled studies to say with confidence.

llms.txt: adoption without evidence

The llms.txt proposal has a reasonable premise: give AI crawlers a curated map of your best content, similar to how robots.txt guides search engine crawlers. As of early 2026, over 844,000 sites have implemented it.

The evidence of impact is less encouraging. A Search Engine Land analysis found that 8 out of 10 sites saw no measurable change in traffic after implementation. No major AI platform, not OpenAI, not Google, not Anthropic, not Perplexity, has confirmed it consistently reads or uses the file.

That doesn't mean llms.txt is worthless. It might become valuable as platforms evolve. It's a low-cost implementation. But right now, the gap between adoption and evidence is wide enough to be honest about.

The "GEO" framework: promising but premature

Generative Engine Optimization as a discipline is real and important. The research emerging from Princeton and other institutions is doing genuine work to understand how AI systems select and cite sources.

The pushback comes from the pace at which early findings have been generalized into actionable advice. John Mueller, Google's Search Advocate, has been direct: "There is no such thing as GEO or AEO without doing SEO fundamentals." Technical communities have described some GEO advice as prematurely prescriptive for a field this young.

The core tension is this. Search engine optimization, for all its complexity, stabilized around a single dominant platform with relatively consistent rules. LLM optimization is trying to build a discipline around multiple platforms that work differently, change frequently, and share almost nothing about their internal ranking logic.

The foundations are being built. They're just not settled yet.

Strong evidence Weak / conflicting

▲

U-shaped attention
JS not rendered

▲

Brand > backlinks
Structure helps

▲

Schema impact
llms.txt utility

How confident should we be?

These are not the same system

Here's where the picture gets more complicated. And more useful.

When people say "optimize for LLMs," they're treating these systems as a single category. But the major AI search platforms work differently in ways that matter for your content strategy.

One thing they do share: query fan-out. When you ask a question, the system doesn't send your exact words to a search index. It decomposes your query into multiple sub-queries, retrieves results for each, then synthesizes across them. Every major platform does this. The differences are in how far they take it.

ChatGPT Search

OpenAI operates two bots: GPTBot for training data and OAI-SearchBot for its search index. Its query fan-out rewrites the user's question into targeted sub-queries, sends those to search providers, then synthesizes results from its own index and partner data.

According to Vercel's analysis, GPTBot downloads JavaScript files about 11.5% of the time but does not execute them. It only sees the initial raw HTML. It follows robots.txt but does not respect canonical tags or meta noindex.

Its citation behavior diverges most from traditional search rankings. The heavy bias toward Wikipedia and Reddit, combined with the 90% citation rate from positions 21+, makes it the hardest platform to predict using traditional SEO signals.

Perplexity

Perplexity has the most distinctive architecture. According to independent analysis, it receives SERP results from Google's API, then programmatically visits the top 5 to 10 results and extracts their text. It doesn't use sitemaps. Nearly every query triggers a live fetch.

Perplexity also made headlines when Cloudflare reported that when blocked via robots.txt, Perplexity appeared to switch to undeclared crawlers mimicking Chrome on macOS. Perplexity disputed the attribution. The controversy remains unresolved.

Despite this, Perplexity's results correlate most closely with traditional search rankings. That makes sense if it's primarily drawing from Google's own results.

Google AI Overviews

Google's system is architecturally different from every other player because it doesn't need to crawl at query time. It already has the world's largest search index.

Google takes query fan-out further than anyone else. Because it already has the index, it can decompose a single query into 10 to 12 sub-queries that simultaneously hit the web index, Knowledge Graph, Shopping, News, and other data sources, all in parallel.

User's query

"best content strategy for startups"

▾

content strategy startups

startup marketing budget

content planning small teams

startup growth channels

content ROI measurement

B2B content strategy

startup SEO basics

content distribution channels

early stage marketing

content team hiring

▾ results synthesized into

AI Overview

One query becomes 10 parallel sub-queries across Google's data sources

This is a meaningful difference. A page optimized for one specific keyword might get pulled into an AI Overview through a completely different sub-query path. The surface area for visibility is wider than traditional keyword targeting suggests.

Claude

Claude uses Brave Search as its search provider. Its web fetch tool can dynamically filter search results: Claude can write and execute code to select relevant content before it enters the context window. Rather than simple chunk retrieval, the model itself reasons about what to include.

There's also a curious anomaly. Data presented at Tech SEO Connect 2025 suggested ClaudeBot may render JavaScript in some cases, unlike every other major AI crawler. Anthropic hasn't confirmed this.

	ChatGPT	Perplexity	Google AI	Claude
Index	Own + partners	Google API + live fetch	Full Google index	Brave Search
JS rendering	No	No	Yes (existing index)	Possibly (unconfirmed)
Crawl trigger	Background + query	Nearly every query	Pre-indexed	Per query
SERP correlation	Low (~10% overlap)	High	Moderate	Varies
Key bias	Wikipedia, Reddit	Google top results	Deepest fan-out	Brave index scope

Four platforms, four different systems

The overlap problem

Here's the number that should give you pause: only 11% of domains appear in both ChatGPT and Perplexity citations.

These systems are not returning slight variations of the same answer. They're reaching genuinely different conclusions from different processes.

89% unique to ChatGPT

11%

89% unique to Perplexity

Domain citation overlap between platforms

Saying "optimize for LLMs" without specifying which one is like saying "optimize for social media" without specifying the platform. The advice might sound universal, but the mechanics aren't.

The JavaScript blind spot

This deserves its own section because it's the most technically actionable finding in this essay. And the most underappreciated.

No major AI crawler executes JavaScript. Vercel's analysis of over 500 million GPTBot requests found zero evidence of JavaScript execution. The same is true across other major AI crawlers, with the possible but unconfirmed exception of ClaudeBot.

If your content is rendered client-side, through React, Vue, Angular, or any other JavaScript framework that generates the page in the browser, AI crawlers see an empty shell.

What your visitor sees

How to Build a Content Strategy

A content strategy starts with understanding what your audience...

The three pillars of effective content are...

Here's a framework we've tested across 50+ clients...

What the AI crawler sees

No content rendered

Client-side rendered content is invisible to AI crawlers

This is the 2010 problem all over again. A decade ago, search engines couldn't reliably render JavaScript, and sites that relied on client-side rendering were invisible to Google. Google eventually solved this with its own rendering infrastructure. AI crawlers haven't.

The practical advice is straightforward. Ensure your important content is present in the initial HTML response, before any JavaScript execution. Server-side rendering or static generation solves this. If you're running a JavaScript-heavy site, view your page source, not the rendered DOM, but the actual HTML, and check whether your content is there.

If it's not, every AI system except possibly Google's (which benefits from its existing rendered index) is blind to it.

What this means for your content

The temptation after an essay like this is to end with a tactics list. "10 ways to optimize based on what we've learned." But given how much of this landscape is still uncertain, prescribing specific tactics would contradict the point of everything above.

Instead, here's how I'd think about it.

The fundamentals haven't changed, and that's not a platitude. Clear writing, genuine expertise, logical structure, specific claims supported by evidence. These are good for readers, AND they're exactly what survives the extraction pipeline. When a model strips your page to text and selects relevant chunks, the quality of your writing and the clarity of your structure is what remains.

There's no shortcut that substitutes for this.

Think in text, not in pages. Your content will be processed as a text extract, not as a designed experience. Headers serve as structural signals, not visual ones. Introductions matter disproportionately, not because of a "top 30%" rule, but because of how positional attention actually works in these models. Self-contained paragraphs that can stand alone as useful chunks perform better than ideas that depend on surrounding context to make sense.

Be honest about the platform question. If you're in content strategy, you need to decide how much to invest in platform-specific optimization versus universal quality. Right now, the universal quality bet is safer. It performs reasonably across all platforms and doesn't require rebuilding if one system changes its approach next quarter.

Watch the evidence, not the advice. This space is moving fast, and the advice is moving faster than the evidence supporting it. When you encounter a new LLM optimization recommendation, ask: what study is this based on? Has it been replicated? Does it apply across platforms or just one? The people doing the best work in this space, and there are many, are the ones citing specific research and acknowledging limitations.

Follow them.

What we're still figuring out

We're watching a new information architecture take shape. The way AI systems process, select, and cite web content is going to evolve significantly over the next few years. The models will get better at handling long contexts. The platforms will change how they crawl, index, and attribute sources. New research will confirm some of what we believe today and overturn the rest.

The honest position is intellectual humility paired with attention. Understand the mechanisms as best you can. Question claims that aren't sourced. Invest in the things that have always made content valuable, expertise, clarity, specificity, because those are the qualities that survive regardless of how the technology shifts.

This is a working document. As better research emerges, it will be updated.