llms.txt: Proposed Standard for LLM Content

Definition: What is llms.txt?

llms.txt is a standard proposed in 2024 by Jeremy Howard (Answer.AI) for a curated, Markdown-formatted file at the website root (/llms.txt). It gives large language models and their tool stacks a machine-readable overview of a site's most important content: core pages, documentation, API references, whitepapers, policy pages. The format is deliberately lean - Markdown, no new schemata - and is therefore easy to produce for publishers and to consume for LLMs.

An important framing: llms.txt is a proposed standard, not a formal web standard. Neither the IETF nor the W3C have ratified it. Neither OpenAI, Anthropic, Google nor Perplexity have officially confirmed that their crawlers consume the file normatively. Adoption is community-driven: Answer.AI, the Anthropic documentation, Vercel, Stripe and numerous dev-tool providers have implemented llms.txt. The strategic value lies in editorial curation and signal - not in guaranteed LLM impact.

Core idea

llms.txt is a signal, not a guarantee

The file regulates nothing. It curates. Its value lies in the editorial process - which content is relevant for an external machine? - and in the signal to tool developers and LLM providers that the site is maintained in a structured way.

Distinction from robots.txt

The two formats are often confused. robots.txt is an access-control mechanism: which crawlers may fetch which paths. It is a de-facto standard established in 1994, respected by every serious crawler. llms.txt, by contrast, is a curation recommendation: which content is particularly relevant for LLM consumption. It regulates nothing. Both files sit at the root but have fundamentally different functions.

Operationally they complement each other: GPTBot, Google-Extended and ClaudeBot are regulated via robots.txt; llms.txt curates content for cases where the crawlers are allowed - or where users open the site directly in a chatbot.

Structure and syntax

The format is Markdown. The specification recommends the following structure:

# Site name

> Short description of the site in 1-3 sentences.

Optional extended section with additional context,
audience, license or usage notes.

## Docs

- [Getting Started](https://example.com/docs/start): First-time setup
- [API Reference](https://example.com/docs/api): All endpoints

## Examples

- [Quickstart](https://example.com/examples/quickstart.md): Minimal example
- [Advanced](https://example.com/examples/advanced.md): Advanced patterns

## Optional

- [Changelog](https://example.com/changelog)
- [Archive](https://example.com/archive)

The ## Optional section is special: links there are hints that LLM consumers may skip when the token budget is tight. Every other section counts as important.

llms-full.txt - the full-text variant

Alongside the curated link list, the specification recommends a second file: /llms-full.txt. It contains the complete plain text of all pages referenced from llms.txt, concatenated in a tidy way. Goal: an LLM can load the relevant site context into its context window with a single request, without fetching each page separately. That reduces latency and tool-call overhead dramatically in dev-agent usage.

For enterprise sites with thousands of pages, llms-full.txt is not realistic. Prioritization on the 30-100 most important pages is the standard. For documentation sites (Stripe, Vercel, the Anthropic docs) the full version is the primary use case.

Practice: llms.txt for enterprise sites

Operational rollout in six steps:

Curation: which 30-50 pages represent the core expertise and core services? Decision jointly with editorial and product.
Categorization: sections like ## Services, ## Methodology, ## Cases, ## Glossary.
Descriptions: every link with a 5-15-word summary. No marketing platitudes, concrete facts.
Optional section: blog archives, press archives, deep feature pages.
Maintenance: quarterly update. Remove dead links. Add new core content islands.
Technical: file as UTF-8, content type text/plain; charset=utf-8, HTTP 200 on /llms.txt. No redirect, no CMS wrapper.

Typical mistakes in llms.txt implementations

Auto-generated dump. Every URL from the sitemap copied 1:1. Defeats the core purpose of curation. llms.txt loses its signal value.
Marketing bullet points instead of facts. "We are the market leader". Without concrete substantive value. LLMs do not use this as context.
No maintenance. A file from January 2024, dead links to restructured pages. Trust signal negative.
Confusion with robots.txt. Access blocks in llms.txt have no effect. Whoever wants to block GPTBot does it in robots.txt.
Oversized llms-full.txt. Files over 5 MB exceed typical LLM context windows and are loaded fragmentarily. Prioritize true core content.

Related terms

llms.txt belongs to the operational AI crawler setup - together with robots.txt, GPTBot, Google-Extended and other AI crawler user agents. Strategically it is part of GEO and LLM SEO infrastructure. For documentation sites it complements XML sitemaps as a semantic curation signal.

FAQ on llms.txt

What is llms.txt? ▾

llms.txt is a proposed standard, initiated in 2024 by Jeremy Howard (Answer.AI), that provides a curated, Markdown-formatted overview of a website's most important content at the root level. The goal: give LLMs a machine-readable navigation aid to identify relevant pages efficiently - without having to crawl the entire site.

Is llms.txt an official standard? ▾

No, llms.txt is a proposed standard without formal ratification by the IETF or W3C. Neither OpenAI, Anthropic, Google nor Perplexity have officially confirmed that their crawlers consume the format normatively. Adoption is community-driven. The strategic value lies in documentation and signal - not in guaranteed impact.

How does llms.txt differ from robots.txt? ▾

robots.txt regulates crawler access - what may be crawled. llms.txt curates content - what is relevant. robots.txt is access control, llms.txt is a recommendation. Both files live at the root, but they have different functions and are consumed by different systems.

How is llms.txt structured? ▾

Markdown format with an H1 title (site name), a blockquote summary, optional detail paragraphs, and H2 sections containing grouped links with descriptions. In addition, an llms-full.txt is recommended, containing the full content of important pages as plain text for LLM context.

Do I have to create llms.txt? ▾

Not mandatory. The operational value lies in the editorial discipline - llms.txt forces curation of the most important pages and at the same time gives a clear signal to tool developers that the site is maintained in a structured way. For knowledge bases, documentation sites and trade portals it is a worthwhile investment.

Definition: What is llms.txt?

llms.txt is a signal, not a guarantee

Distinction from robots.txt

Structure and syntax

llms-full.txt - the full-text variant

Practice: llms.txt for enterprise sites

Typical mistakes in llms.txt implementations

Related terms

FAQ on llms.txt

Further reading

Technical SEO in the AI crawler era - audit framework

Prompt-Level SEO - the playbook for ChatGPT citations

GEO vs. SEO - why Generative Engine Optimization is a new discipline

ChatGPT SEO: llms.txt in context.

RAG & SEO: passage optimization.

Answer Engine Optimization: the stack.

llms.txt curation for your domain.