What is llms.txt and Why Every Website Needs One in 2026
Right now, AI tools are reading your website — and many of them are getting it wrong.
ChatGPT processes over one billion queries every day. Perplexity serves 780 million searches every month. Google's AI Overviews now appear on more than 13% of all searches. When someone asks one of these tools about your industry, your services, or your competitors, there is a very real chance your brand is mentioned — or conspicuously absent.
The problem? These AI systems weren't designed to navigate a modern website. They're contending with cookie banners, JavaScript-heavy navigation, sticky headers, interstitial modals, and ad placements — all before they even reach your actual content. The result is that AI tools frequently misrepresent, skip, or hallucinate information about businesses whose websites lack clear, AI-readable signals.
For years, you solved this for search engines with two files: robots.txt (what crawlers can see) and sitemap.xml (what pages exist). But neither was designed for language models. Neither gives AI systems the context they need to understand your business and represent it accurately.
That's exactly the gap llms.txt was created to fill.
What is llms.txt?
llms.txt is a plain Markdown file placed at the root of your website that gives AI systems a curated, context-rich map of your most important content.
It was proposed on 3 September 2024 by Jeremy Howard, co-founder of Answer.AI (the AI research lab behind fast.ai). Howard identified a fundamental mismatch: large language models have strict context window limits and struggle to extract meaningful signal from the dense, cluttered HTML that makes up most modern websites. He proposed a standard — modelled loosely on robots.txt — that websites could use to proactively solve this.
The official specification lives at llmstxt.org and the source repository is at github.com/AnswerDotAI/llms-txt.
The file is:
- Markdown-formatted (not XML or JSON — LLMs already understand Markdown natively)
- Human-readable (you can open it in any text editor)
- Served at the root of your domain:
https://yourdomain.com/llms.txt - Curated by you — you decide what's in it
Think of it as the cover letter you write on behalf of your website, addressed to every AI that visits.
The Problem llms.txt Solves
To understand why llms.txt matters, you need to understand how AI systems actually read the web.
When an LLM or AI agent visits your site, it doesn't see it the way a human does. It receives raw text extracted from HTML — a jumbled mix of navigation labels, footer boilerplate, cookie consent strings, CTA button text, widget labels, and your actual content, all mashed together. There's no visual hierarchy. There's no obvious signal for what's important versus what's structural noise.
On top of that, LLMs have finite context windows. They cannot read your entire site in one pass. They have to make choices about what to include, what to summarise, and what to discard — often with very little reliable signal to guide those choices.
The consequences are real:
- AI tools hallucinate details about your services that aren't accurate
- Your site gets skipped entirely in favour of competitors with cleaner content signals
- AI systems misattribute expertise or describe you in vague, generic terms
- Your brand is underrepresented in AI-generated answers, summaries, and recommendations
llms.txt gives you control. Instead of hoping an AI correctly interprets your site, you tell it exactly what your site is, who it serves, and where to find the content that matters most.
llms.txt vs robots.txt vs sitemap.xml
These three files are often mentioned together, but they serve completely different purposes. Understanding the distinction is critical.
robots.txt vs llms.txt — what each one does
robots.txt — Access Control
- Written for search engine crawlers (Googlebot, Bingbot, etc.)
- Controls what pages crawlers are permitted to access
- Uses directives: Allow, Disallow, Crawl-delay
- Focused on permissions and restrictions
- Has no content descriptions or context
llms.txt — AI Curation
- Written for large language models and AI agents
- Curates which content is most valuable for AI to understand
- Uses Markdown: headings, links, and descriptions
- Focused on context, clarity, and comprehension
- Every entry explains what the page is and why it matters
sitemap.xml vs llms.txt — completeness vs curation
sitemap.xml — Discovery
- Designed to help search engines discover every page
- Lists all URLs — typically hundreds or thousands
- XML format, not human or AI-friendly to read
- Prioritises completeness over context
- No descriptions — just URLs and metadata
llms.txt — Comprehension
- Designed to help AI systems understand your key content
- Lists only your most important pages — deliberately selective
- Markdown format — readable by humans and LLMs alike
- Prioritises context and quality over quantity
- Every link includes a short explanation of what it covers
The short version: robots.txt = permission, sitemap.xml = discovery, llms.txt = comprehension. You need all three. None replaces the others.
The llms.txt File Format
The specification defines a simple, strict structure. All valid llms.txt files must follow it.
# Your Site or Project Name
> A concise summary of what your site is and who it's for. This should be
> one to three sentences. It's the most important part of the file —
> everything else provides supporting detail.
Optional additional context goes here as normal prose. Use this to explain
scope, conventions, naming, or anything else an AI would need to understand
the rest of the file correctly.
## Section Name (e.g. Services, Documentation, Blog)
- [Page Title](/path/to/page): A short description of what this page covers
- [Another Page](/path/to/another): What this one answers or explains
## Another Section
- [Resource Title](/resource): Description of this resource
- [Second Resource](/resource-2): Description of this one
## Optional
- [Supplementary Content](/supplementary): Lower-priority content
- [Archive](/archive): Older material that may still be useful
The rules:
- H1 heading — required. The name of your site or project.
- Blockquote summary — required. One to three sentences that give an AI everything it needs to understand the rest of the file.
- H2 sections with link lists — recommended. Group your most important pages under logical headings.
- Link format —
[Title](/url): Short description. The description is optional but strongly recommended. - "Optional" section — any H2 section titled "Optional" signals to AI systems that those links can be skipped if context is tight.
- Standard Markdown only — no custom extensions, no proprietary syntax.
A Real llms.txt Template (Copy and Adapt This)
Here's a practical template written for a marketing or GEO agency — adapt the sections and links for your own site:
# Qwestyon — Paid Ads & Generative Engine Optimisation Agency
> Qwestyon is a performance marketing agency specialising in paid search,
> paid social, and Generative Engine Optimisation (GEO) for B2B and B2C
> businesses. We help companies get found by both traditional search engines
> and AI-powered tools like ChatGPT, Claude, and Perplexity.
We work with businesses across the UK and internationally. Our expertise
spans Google Ads, Meta Ads, LinkedIn Ads, and AI search visibility strategy.
## Services
- [GEO — Generative Engine Optimisation](/services/geo): How we help brands
appear in AI-generated answers and recommendations
- [Google Ads Management](/services/google-ads): Strategy, build, and ongoing
management of Google Search and Performance Max campaigns
- [Meta Ads Management](/services/meta-ads): Lead generation and awareness
campaigns across Facebook and Instagram
## Guides & Resources
- [What is GEO?](/blog/what-is-generative-engine-optimisation-geo): Plain-English
explanation of Generative Engine Optimisation and why it matters
- [How to Measure AI Search Visibility](/blog/how-to-measure-ai-search-visibility-without-guessing):
Practical methods for tracking whether AI tools mention your brand
- [Schema Markup for GEO](/blog/boost-visibility-with-structured-data-a-guide-to-schema-markup-for-generative-engine-optimisation):
How structured data helps AI systems understand your content
- [Track AI Traffic in GA4](/blog/how-to-track-ai-traffic-in-ga4-using-custom-channel-groups):
Step-by-step guide to attributing traffic from AI referrers
## About
- [About Qwestyon](/about): Our approach, team, and philosophy
- [Case Studies](/case-studies): Results we've achieved for clients
## Optional
- [Blog](/blog): Full archive of our marketing and GEO content
- [Contact](/contact): Get in touch or book a discovery call
Who's Already Using llms.txt?
As of early 2026, between 30,000 and 60,000 llms.txt files have been indexed by Google — a tiny fraction of the web, but growing fast. The companies leading adoption are, unsurprisingly, those that build AI tools themselves:
Anthropic — A comprehensive implementation covering their entire API surface, libraries, prompt library, and developer documentation. One of the most thorough examples in the wild, and a strong signal of how seriously Anthropic takes the standard.
Stripe — Organises their llms.txt by product category. Helps AI systems navigate Stripe's famously extensive documentation without getting lost in thousands of pages.
Cloudflare — Takes a sophisticated approach: a primary llms.txt plus multiple product-specific llms-full.txt files. AI agents can fetch just the context relevant to their query rather than loading everything.
Vercel — Implements both llms.txt and llms-full.txt, reflecting their focus on developer tooling and AI-first workflows.
Zapier — Heavily focused on their AI Actions API, making it easy for AI systems to understand Zapier's automation capabilities and how to surface them in responses.
The common thread: all of these companies have extensive documentation and complex product surfaces. llms.txt gives them a way to surface the right information to AI systems without those systems having to wade through thousands of pages.
Why llms.txt Matters for GEO
In GEO, accuracy and citation are everything. AI tools don't rank you on a numerical scale — they either include you in their answer or they don't. They either cite you as a source or they cite someone else. The research on what drives AI visibility is clear:
- Adding expert quotations to your content increases AI visibility by ~40%
- Including statistics with proper attribution boosts AI mention rates by ~35–40%
- Properly citing your sources increases citation likelihood by ~30–40%
- Traditional keyword stuffing, by contrast, actively hurts GEO performance by ~10%
llms.txt is the infrastructure layer beneath all of this. It doesn't replace high-quality content — nothing does. But it dramatically increases the chance that an AI system will correctly understand your expertise, navigate to your best content, and represent you accurately when a user asks a relevant question.
Think of it this way: you can write the best content in your industry, but if an AI misidentifies what your site is about in the first place, that content never gets surfaced. llms.txt solves the identification problem.
For a deeper look at how AI-driven traffic is tracked and measured, see our guide on measuring AI search visibility without guessing and how to track AI traffic in GA4 using custom channel groups.
How to Create Your llms.txt
Create your llms.txt in 6 steps
- 1
Audit your most important pages
List your 10–20 most valuable pages: core service pages, your best-performing blog posts, your about page, key case studies. These are the pages you most want AI systems to understand and cite. Ignore everything else for now.
- 2
Write your H1 and blockquote summary
Your H1 is just your site or brand name. Your blockquote is the most important part of the whole file — one to three sentences that explain exactly what you do, who you serve, and what makes you worth citing. Write it as if you're explaining your business to a very smart person who has never heard of you.
- 3
Group your pages into logical sections
Create H2 headings that group your links by topic or type. Common sections include: Services, Blog / Resources, Documentation, About, Case Studies. Don't over-engineer this — three to five sections is plenty for most sites.
- 4
Write a short description for each link
After each link, add a colon and a one-sentence description of what that page covers or answers. This is what helps AI systems decide whether a page is relevant to a given query. Generic descriptions waste the opportunity — be specific.
- 5
Publish at your domain root
Save the file as llms.txt and place it at the root of your domain so it's accessible at https://yourdomain.com/llms.txt. Ensure the server returns HTTP 200, the file is publicly accessible, and your CDN or caching layer isn't blocking it. Check it loads in an incognito browser.
- 6
Validate and set a review schedule
Use a free validator like llmstxtchecker.net or llmstxtvalidator.dev to confirm your file is syntactically correct. Then set a calendar reminder to review it whenever you significantly update your site — quarterly is a sensible default.
llms.txt Best Practices
What to include in your llms.txt
- ✓Your most important service, product, or content pages — the ones you'd most want an AI to cite
- ✓A blockquote summary that is specific, not generic ('We help B2B SaaS companies generate inbound leads via Google Ads' beats 'We are an innovative digital marketing agency')
- ✓Short, specific descriptions after each link that explain what the page answers or covers
- ✓Links to Markdown versions of pages where available — easier for LLMs to parse than HTML
- ✓An 'Optional' H2 section for lower-priority content that can be skipped in constrained contexts
- ✓A version date or 'Last updated' note in your blockquote or intro prose
- ✓Your canonical domain URL — use https:// with your primary domain consistently
What to leave out of your llms.txt
- ✓Confidential information: internal pricing, employee data, unreleased products
- ✓Outdated content: old pricing pages, discontinued services, deprecated product names
- ✓Duplicate or thin pages that add no unique value
- ✓Every single page on your site — llms.txt is a curated shortlist, not an exhaustive index
- ✓Generic boilerplate descriptions that could apply to any site in your industry
- ✓Pages blocked by robots.txt — don't list pages you don't want crawled
The Honest Take on llms.txt in 2026
The Next Step: Pair llms.txt with Schema Markup
llms.txt and structured data (schema markup) are complementary strategies. Where llms.txt gives AI systems a narrative map of your content, schema markup gives them machine-readable metadata about individual pages — your entity type, your reviews, your FAQs, your how-to guides.
Together, they create a robust AI-visibility stack: schema tells AI systems what a page is, llms.txt tells them why it matters and where to find your best work.
If you haven't tackled schema yet, our guide to Schema Markup for Generative Engine Optimisation walks through exactly which schema types have the most impact on AI visibility, with working examples.
Frequently Asked Questions
llms.txt — Common Questions
What is llms.txt?
llms.txt is a Markdown file placed at the root of your website that describes your site's content and links to your most important pages in a format optimised for large language models (LLMs) to read and understand. It was proposed by Jeremy Howard of Answer.AI in September 2024.
Is llms.txt required for SEO?
No. llms.txt has no direct impact on traditional search rankings and is not an official Google standard. Its value lies in Generative Engine Optimisation (GEO) — helping AI tools like ChatGPT, Claude, and Perplexity accurately understand and cite your website in their responses.
How do I create an llms.txt file?
Create a plain Markdown file named llms.txt and place it at the root of your domain (e.g. yourdomain.com/llms.txt). It must include an H1 title, a blockquote summary (1-3 sentences), and a curated list of your most important pages with short descriptions. Keep it updated whenever your content significantly changes.
Which AI tools actually read llms.txt?
There is no universal standard yet, but OpenAI, Anthropic, and Perplexity crawlers have been observed reading llms.txt files in the wild. Anthropic specifically requested its documentation providers implement llms.txt. Adoption by AI systems is growing but not yet guaranteed across all platforms.
What is the difference between llms.txt and robots.txt?
robots.txt controls what search engine crawlers can and cannot access — it is about permission. llms.txt is not about access control at all. It is about curation: telling AI systems what your site is, which content matters most, and providing context that raw HTML cannot reliably communicate.
Does llms.txt replace sitemap.xml?
No. sitemap.xml helps search engines discover every page on your site for indexing. llms.txt is a curated, quality-first guide to your most important content — designed for AI comprehension, not crawler discovery. You need all three: robots.txt for access rules, sitemap.xml for discovery, and llms.txt for AI understanding.
The Bottom Line
AI systems are becoming a primary discovery channel — for brands, for services, for expertise. The businesses that will win in AI search are the ones building the right infrastructure now, before everyone else catches up.
llms.txt is one of the lowest-effort, highest-upside steps you can take today. It takes under an hour to write well. It requires no technical implementation beyond uploading a text file. And it directly addresses the most common failure mode in AI representation: AI systems not understanding what your site is actually about.
Create yours this week. Use the template above, validate it with a free checker, and set a quarterly reminder to keep it updated. Then pair it with schema markup and a structured content strategy — and you'll have a GEO foundation that most of your competitors haven't thought about yet.
Want help building out your full GEO strategy — not just the technical foundations, but the content signals that get you cited by AI tools? That's exactly what we do at Qwestyon. Get in touch and let's talk.