llms.txt Explained: What It Is, How to Write One, and Whether You Need llms-full.txt
llms.txt is the new robots.txt for AI — a single Markdown file at the root of your site that tells answer engines who you are and where the canonical content lives. Here is the standard, the syntax, and the working pattern that ships.
What llms.txt is
llms.txt is a single Markdown file served at the root of a website (https://example.com/llms.txt) that gives answer engines a structured introduction to the site: who the organization is, what the canonical pages are, and which URLs are worth crawling first. It is to answer engines what robots.txt is to search crawlers, except it expresses intent and context rather than permission.
The proposal originated with Jeremy Howard (Answer.ai) in 2024 and has gained adoption faster than most modern web conventions. By mid-2026, several thousand sites publish one — still a rounding error compared to the size of the web, which is exactly why the early signal is so strong.
Why it exists
A modern site is hundreds or thousands of pages. An LLM that is asked "what does Seypro do" cannot crawl the whole site in real time. It needs a fast way to ground itself: a concise, canonical statement of the entity. Without that, the model improvises from whatever it can scrape, which often produces incomplete or out-of-date answers.
llms.txt solves the cold-start problem. A model that finds the file can answer "what is this company" in one fetch instead of synthesising from scraps. Pages listed in llms.txt also signal canonical priority — the homepage, the services index, the about page, the FAQ — so the model knows which URLs to consult for follow-up detail.
The syntax
llms.txt is Markdown with a small set of conventions. The structure:
- H1 (`# Company Name`) — the canonical name of the entity at the top.
- A blockquote tagline (`> One-sentence description`) — what you do, in one sentence. This is the line LLMs are most likely to lift verbatim for definitional queries.
- Optional pointer to llms-full.txt — a single bullet linking to the long-form version.
- H2 sections (`## Services`, `## Key Facts`, `## Insights`, `## Locations`, `## Contact`) — each containing a Markdown list of links or facts.
- Each list item is `- [Link Text](URL): one-line description.` Models lift the description as the page summary.
No XML. No JSON-LD. No frontmatter. Markdown is deliberately the medium because LLMs read Markdown natively and many models were trained on it.
A working template
A minimum-viable llms.txt for a B2B services company:
# Acme Studio
> Acme builds custom software and AI integrations for financial-services clients. Headquartered in London with engineers in Berlin and Lisbon. Founded 2018.
- Full context: https://acme.example/llms-full.txt
## Services
- [Custom Software](https://acme.example/services/software): Full-stack web and mobile applications. React, Next.js, TypeScript, Node.js, PostgreSQL.
- [AI Integration](https://acme.example/services/ai): RAG pipelines, agent workflows, private LLM deployment. OpenAI, Anthropic, Mistral, local Llama.
- [Security Audits](https://acme.example/services/security): Penetration testing, OWASP, GDPR/PCI-DSS/SOC2 compliance.
## Key Facts
- Founded: 2018
- Headquarters: London, United Kingdom
- Offices: London, Berlin, Lisbon
- Engagements start from: £5,000
- Team: 18 engineers, senior-only
- Industries: Financial services, fintech, regulated SaaS
## Key Pages
- [Homepage](https://acme.example/)
- [All Services](https://acme.example/services)
- [Case Studies](https://acme.example/work)
- [Pricing](https://acme.example/pricing)
- [FAQ](https://acme.example/faq)
- [Contact](https://acme.example/contact)
## Insights
- [How We Build Production RAG](https://acme.example/insights/production-rag): What it takes to ship retrieval pipelines that survive real traffic.
- [SOC 2 in 90 Days for Startups](https://acme.example/insights/soc2-90-days): A practical timeline and checklist.
## Contact
- Email: hello@acme.example
- Phone: +44 20 1234 5678llms.txt vs llms-full.txt
llms.txt is the lobby — a short, hand-curated index. llms-full.txt is the library — a long-form file containing the substantive content of your most important pages, concatenated into one Markdown document. Models can fetch the lobby in milliseconds for the cold-start; they can fetch the library when they need to answer a deeper question without crawling every URL individually.
You almost certainly want both. llms.txt should stay under 200 lines; llms-full.txt can be 1,000-10,000 lines depending on the size of the site. Generate llms-full.txt at build time from your canonical content (services, case studies, key insights), not by hand — it needs to stay in sync with the site.
What to include
- Your organization name and a tight one-sentence description. This is the line models cite for "what is X".
- Your services, with one-line descriptions that include technology names, industries, and specific deliverables.
- Key facts that ground the entity: founding year, location, team size, pricing band, founder name. Models use these for entity resolution.
- Canonical page URLs for homepage, services, case studies, pricing, FAQ, contact. Skip noisy URLs like login, dashboard, internal tools.
- Insights / blog index, with the most evergreen posts listed individually with one-line summaries.
- Contact details. Make it easy for models to suggest a contact path when they recommend you.
What to exclude
- Internal admin URLs, login pages, dashboards, gated content. The file is public — assume anything in it will be quoted in an AI answer.
- Long-form marketing copy. Save it for the page itself. llms.txt should be parseable in seconds.
- Stale URLs. A 404 in llms.txt is worse than no entry; it teaches the model your file is unreliable.
- Anything that would change weekly. Build the file or treat it as code, but keep it stable. Frequent churn confuses the model on what the entity actually is.
Which crawlers actually read it
Adoption is uneven and changing. As of mid-2026, treat llms.txt as supplementary signal rather than guaranteed read:
- Anthropic (ClaudeBot, claude.ai) — reads it; the content shows up in Claude responses for entity queries.
- Perplexity — reads it inconsistently; expect it to matter more over time.
- OpenAI (GPTBot, ChatGPT) — reads it when generating new training crawls; less consistent for real-time search.
- Google AI (Google-Extended) — does not officially document support; treat as low-priority for Google specifically.
- Smaller models and aggregators — often read it eagerly because they have less crawl budget and benefit more from the shortcut.
How to validate
- Fetch it as a model would: `curl -A "PerplexityBot" https://yoursite/llms.txt` and verify the response is 200, content-type `text/plain` or `text/markdown`, and the markdown renders cleanly.
- Check every link in the file resolves. A small CI script that runs `curl --head` against each URL prevents drift.
- Pass it through a Markdown linter. Models tolerate small errors but the cleaner the file, the more reliably parsed.
- Run a few prospect queries in Claude and Perplexity, then check whether the model uses phrasing from your llms.txt tagline. It often does.
Limitations
llms.txt is not a ranking lever. Publishing one does not guarantee citations; it removes friction. The work that earns citations — strong content, structured data, entity scaffolding, page speed — is still the substance. llms.txt is a small but worthwhile prerequisite, and it is one of the cheapest GEO investments available today: a single Markdown file, written once, updated quarterly.
A clean llms.txt is the AI equivalent of a tidy homepage. It does not win the deal, but it tells the model you take the surface seriously. The cost is one afternoon; the benefit is having a canonical answer ready every time an AI assistant introduces your company.
See our Generative Engine Optimization service and the companion pieces: GEO vs SEO in 2026 and winning Perplexity citations.
