Where llms.txt came from
llms.txt was proposed by Jeremy Howard of Answer.AI in September 2024, with the specification published at llmstxt.org. The motivating observation is simple: language models work inside a finite context window, and a typical HTML page spends most of its bytes on navigation, scripts, styling, and boilerplate rather than the content itself. A model (or an agent built on one) that wants to understand your site has to burn tokens wading through all of that.
The proposal borrows the shape of robots.txt and sitemap.xml: a single well-known file at a predictable root path. Instead of access rules or a URL dump, it holds a short, human-curated summary of what the site is and links to its most important pages, written in plain markdown so any model can parse it without an HTML pipeline.
The format
An llms.txt file is ordinary markdown with a fixed skeleton:
- An H1 with the site or project name. This is the only required element.
- A blockquote summary directly under the H1: one or two sentences describing what the site is and who it serves.
- H2 sections containing markdown link lists. Each section groups related pages (for example “Docs”, “Guides”, “Pricing”), and each list item is a link followed by an optional one-line description of what a model will find at that URL.
- An optional “Optional” section for secondary links a model can skip when its context budget is tight.
The spec also describes llms-full.txt, an expanded single-file variant. Where llms.txt is a linked table of contents, llms-full.txt inlines the full content of the referenced pages into one large markdown document, so a model can ingest everything in a single fetch. Documentation sites are the most common adopters of the full variant.
What llms.txt does and does not do
Honest status first: as of mid-2026, no major AI vendor has officially committed to fetching llms.txt as part of a published crawl or retrieval policy. OpenAI, Google, Anthropic, and Perplexity all document their crawlers and user agents; none of those documents name llms.txt as an input. Treat any claim that llms.txt directly improves your LLM visibility today with skepticism.
It also has no role in training control. Whether an AI company may crawl your site for training data is governed by robots.txt directives aimed at agents like GPTBot, along with whatever terms you publish. Adding an llms.txt file neither grants nor revokes that permission.
What it does offer: cheap insurance. The file costs minutes to create, weighs a few kilobytes, and is immediately useful to any agent that does fetch it. That population is real and growing: coding assistants pointed at your docs, custom research agents, retrieval pipelines that check well-known paths, and developer tools that support the convention. If the major assistants adopt it later, early adopters are already done. If they never do, you spent minutes producing a clean summary of your own site, which tends to be useful anyway.
How it relates to robots.txt and sitemap.xml
The three root files answer three different questions, and llms.txt makes the most sense viewed alongside its two older siblings:
- robots.txt is access control. It tells crawlers which user agents may fetch which paths. This is where AI-training directives live, via rules targeting agents such as GPTBot.
- sitemap.xml is an exhaustive URL inventory. It lists every indexable page for search engine crawlers, with no opinion about which pages matter most or what any of them contain.
- llms.txt is a curated reading list. It says: if you can only spend a few thousand tokens understanding this site, read these pages, in this order, and here is a one-line summary of each.
They complement rather than replace each other. A site can and usually should serve all three: robots.txt to set the rules, sitemap.xml for completeness, and llms.txt for the short version aimed at context-constrained readers.
How to create an llms.txt file
The manual route: open a text editor, write an H1 with your site name, add a blockquote describing the business in one or two sentences, then group your 10 to 30 most important URLs under H2 sections with a one-line description each. Save it as plain text and serve it at the root of your domain, at /llms.txt. Regenerate it when your key pages change, the same way you would keep a sitemap current.
The faster route: our free llms.txt generator crawls your site, drafts the summary and the curated link list in the correct format, and gives you a file you can review and upload. No signup required.
We practice what we describe here: rank.ai serves its own file at https://www.rank.ai/llms.txt, and you can use it as a working reference for the format.