Same Joomla Page, 80% Fewer Tokens: Serve Markdown to ChatGPT
Ask ChatGPT, Claude, or any AI tool out there about something your site covers better than anyone, and it might answer without ever pulling from your page. Often, the reason has nothing to do with your writing.
When an AI reads your page, it takes in the raw HTML: menus, sidebars, cookie banners, ads, and a pile of tags, and it pays for every byte in tokens. By the time it reaches your actual words, most of its budget is gone, and your key paragraph can come through garbled or skipped.
The same page in clean Markdown is about 80% smaller. This post explains why HTML costs AI so much, and how serving a Markdown copy of your Joomla pages gives ChatGPT and other AI agents a version it can read without wading through your template.
What's the Problem With HTML?
There's nothing wrong with HTML. AI tools are clever enough to read your pages just fine. The trouble is what it costs them. Your words sit wrapped in menus, sidebars, scripts, and nested tags, and a model has to chew through all of it, paying in tokens, before it reaches the part that matters.
That cost is exactly what the big players are playing around. A few months ago, Cloudflare rolled out built-in HTML-to-Markdown conversion for its network. This year, Stripe and GitHub started serving plain Markdown copies of their developer docs. Just append .md to any Stripe docs page and you get the stripped-down version, and GitHub does the same for its own pages.
The gap is wider than you'd guess. Take this very page: in raw HTML it weighs in at 20,148 tokens, and just 3,277 once converted to Markdown, an 84% reduction. That's the whole problem in a single number.
| Format | Characters | Tokens (GPT-4o) | Size vs HTML |
|---|---|---|---|
| HTML (full page) | 63,615 | 20,148 | — |
| Markdown (.md) | 13,574 | 3,394 | 83% smaller |
Every AI model reads text in tokens, and it has a fixed budget per request. When 80% of your page is template chrome (the header, the menu, the cookie banner, inline CSS, tracking scripts), the model burns most of its budget on things unrelated to your content.
Worse, the noise confuses it. The model has to guess which <div> holds your article and which holds an ad. Sometimes it guesses wrong, and your key paragraph gets ignored or garbled.
Markdown strips all of that away. A heading is #, a list is a dash, a link is in brackets. No wrapper, no clutter. The model gets pure structure and pure content, which is exactly the format it was trained on.
What Does "HTML to Markdown" Actually Mean for AI?
Markdown is the format LLMs handle most efficiently. The proposal to serve Markdown versions of web pages, called llms.txt, was published in September 2024 by Jeremy Howard, the founder of Fast.ai (Ahrefs, What Is llms.txt, June 2026). The core idea was simple: give AI a clean copy of your page by appending .md to the URL.
Think of it like serving two versions of the same article. Humans get the full HTML page with your design, your menu, and your branding. AI agents get a stripped-back Markdown twin that says the same thing in a fifth of the space.
The handoff happens through something called content negotiation, which is a normal part of how the web already works. An AI agent sends a request that says, "I'd prefer Markdown, please", and your site hands back the Markdown version. A regular visitor's browser asks for HTML and gets the usual page. Nobody sees a difference. The AI just gets a cleaner read.
According to Cloudflare's 2026 rollout of the same idea, agents request Markdown by sending an Accept: text/markdown header, and the response exposes the savings in x-markdown-tokens and x-original-tokens headers. The mechanism is now a recognized pattern, not a fringe experiment.
Convert Joomla Pages to Markdown Automatically
If you'd like to get ahead of this, you'll want clean, Joomla-aware Markdown pages on dedicated .md URLs that help AI tools read your content properly. That's exactly what Google Structured Data does. We recently added HTML-to-Markdown conversion to Google Structured Data, and it's a single toggle, no code required.

Once it's on, every page gets a Markdown twin that agents can request in 3 ways:
- Append
.mdto the URL. Your page at/my-articlealso answers at/my-article.md(this needs Joomla's SEF URLs enabled). - Add
?markdown=1to the URL. A simple fallback for testing or for tools that can't set custom headers. - Send an
Accept: text/markdownheader. An AI agent that asks for Markdown gets it automatically, with no URL change at all.
Each Markdown page has three parts: YAML frontmatter (title, description, canonical URL, date, language), the cleaned article body, and your full JSON-LD schema as a code block, so agents get your content and structured data in one file. The .md URLs are noindex with a canonical link to your HTML page, so Google won't index them or flag duplicates, and pages are cached through Joomla and refreshed when you edit.
The HTML to Markdown documentation explains every setting.
What Makes This Different From a Generic Converter?
A CDN can flip your HTML to Markdown, but it doesn't know your content is a Joomla article. Cloudflare's own Markdown for Agents converts the entire rendered page, including template chrome, and its frontmatter is limited. A generic converter sees a wall of HTML and does its best to guess.
Google Structured Data starts from inside Joomla, so it knows where your article ends, and the template begins. More importantly, it can also carry your structured data into the Markdown file. When an AI reads that page, it gets your headings, your text, and a machine-readable map of what the page is about (a product, an FAQ, a how-to) in the same request.
There's also a discovery layer. Google Structured Data injects a <link rel="alternate" type="text/markdown"> tag into your page's <head>, so agents can find the Markdown version on their own. And it gives you real, shareable .md URLs, where a header-only approach stays invisible.
A quick reality check on the popular llms.txt alternative. Adoption looks thin: Ahrefs found that across 137,000 domains, 28% publish an llms.txt file, yet 97% of those files are never requested by any bot (Ahrefs, What Is llms.txt, June 2026). Google's Gary Illyes also confirmed in July 2025 that Google doesn't support llms.txt and has no plans to. Serving actual Markdown pages through .md URLs and content negotiation is the more practical bet, and it's what this feature does.
Doesn't Google Say This Is Pointless?
Fair challenge. Search engineers at Google and Bing have pushed back hard on the idea of building bot-only Markdown pages, and they're right about the part that matters most: this won't lift your Google rankings. We're not claiming it will. If you came here hoping a Markdown twin moves you up the results page, it won't, and anyone selling it that way is wrong.
Most of that criticism targets two things this feature isn't. The first is Markdown as an SEO ranking signal, which it was never meant to be. The second is llms.txt, a separate index file you publish and hope bots read. We just covered why that one rarely gets fetched.
What's left is a narrower, mechanical claim. When an AI agent actually requests your page, it gets a version that's roughly 80% smaller and free of template clutter. This won't make AI start recommending you. Whether an assistant cites you depends on far more than file format: your content, your authority, and how well you answer the question. What Markdown changes is the reading itself. When a bot does fetch you, it gets your real words cleanly instead of digging them out of template clutter. That's the honest benefit, at a cost of one toggle.
The "double crawl load" worry doesn't apply here either. The Markdown twin carries an X-Robots-Tag: noindex, nofollow header and a canonical link back to your HTML page, so search engines won't index it or weigh it as duplicate content. When an agent uses content negotiation, it's the same URL serving a lighter response, not a second page to crawl.
Should You Turn It On?
Honestly, we're all still figuring out how AI agents work. It's a vague, fast-moving space. What we do know is this: agents can already read your HTML, but they burn a lot of tokens doing it. So when one shows up and explicitly asks for Markdown, why not hand it over?
And if AI agents end up really needing Markdown a few months from now, you're already set. It won't hurt your site either way. Sure, it won't lift your Google rankings or get your company named in an assistant's recommendations. But it's one toggle, it costs you nothing, and it's worth a try.
Turn on Convert Pages to Markdown in Google Structured Data and see what an AI sees.