Technical standard

llms.txt — What It Is and How to Implement It

Just as robots.txt tells search engine crawlers which pages to index, llms.txt tells AI systems how to understand, represent, and interact with your website. It is a plain-text file that you place at the root of your domain — and it is quickly becoming a baseline expectation for AI-ready websites.

Check if your site has llms.txt

The standard

What is llms.txt?

The llms.txt specification was proposed by Jeremy Howard (founder of fast.ai and Answer.AI) in 2024 as a lightweight convention for making websites more legible to large language models. The concept is deliberately simple: a Markdown-formatted text file at https://yourdomain.com/llms.txt that summarises your site's purpose, key pages, and guidance for AI systems.

Like robots.txt, but richer

robots.txt is a list of allow/disallow rules. llms.txt goes further: it provides context, descriptions, and links to the most important resources on your site — so an LLM can prioritise what to read and how to represent you.

Plain Markdown format

LLMs are trained on Markdown and process it natively. Using Markdown (rather than XML or JSON) for llms.txt makes it immediately parseable by any language model without additional tooling.

Crawlable by AI agents

AI agents that fetch your site to answer user queries will check for /llms.txt as one of their first actions, just as browsers check /favicon.ico. It's a short, low-cost read that dramatically improves agent context.

Implementation

What goes inside llms.txt

The llms.txt format has a defined structure. It uses Markdown with four types of sections, each serving a distinct purpose for AI systems reading your site.

Specification sections

# Brand name (H1)

A single H1 heading with your brand or product name. This is the canonical identifier for your entity.

> Blockquote description

A brief, authoritative description of your product, service, or organisation. Written for an LLM to quote verbatim.

## Section (H2) with links

Named sections containing Markdown links to your most important pages. Common sections: Docs, Blog, API Reference, About, Pricing.

Optional: ## Optional

A section named 'Optional' containing supplementary links that agents can skip when context is limited. Use for detailed docs, FAQs, and deep content.

Example llms.txt

# Surfaceable

> Surfaceable is an SEO and AI visibility platform that helps brands audit their technical SEO health and track how often they appear in responses from ChatGPT, Claude, Gemini, and Perplexity.

## Docs

- [Getting started](https://surfaceable.io/docs/getting-started): How to run your first audit and set up AI visibility tracking

- [MCP server](https://surfaceable.io/docs/mcp): Connect Surfaceable to Claude Desktop and other MCP hosts

- [API reference](https://surfaceable.io/docs/api): REST API for programmatic access

## About

- [Pricing](https://surfaceable.io/pricing): Plans from free to Pro

- [Blog](https://surfaceable.io/blog): SEO and AI visibility guides

## Optional

- [Agentic SEO guide](https://surfaceable.io/agentic-seo): Deep dive on optimising for AI agents

Adoption

Which AI systems read llms.txt?

The llms.txt specification is an emerging convention, not a ratified standard. Adoption is growing rapidly as AI agent frameworks add explicit support.

Supported now

AI agents with web browsing

Agents running in Claude Desktop, Cursor, or custom MCP workflows that fetch pages before answering will read /llms.txt as a priority resource. It shapes how the agent represents your site throughout the session.

Adopted by many

Retrieval-augmented search

LLM-powered search tools like Perplexity and ChatGPT Search increasingly treat well-structured site files as quality signals when deciding which sources to surface.

Early adoption

LLM training pipelines

Some organisations curating web corpora for LLM pre-training and fine-tuning use llms.txt as a signal of content quality and structure. Having one makes your content more likely to be included in high-quality datasets.

Growing fast

Enterprise AI workflows

Enterprise teams building RAG pipelines and internal knowledge bases are adopting llms.txt as a standard signal for which external content to ingest and trust.

llms.txt alongside llms-full.txt

The specification also defines an optional llms-full.txt file containing complete page content in markdown form — intended for AI agents that want to ingest your entire site without crawling every URL individually. This is particularly useful for documentation-heavy sites and SaaS products with large help centres.

Surfaceable

How Surfaceable validates your llms.txt

Every Surfaceable site audit checks for llms.txt as a first-class signal alongside robots.txt, sitemap.xml, and Core Web Vitals. We verify its presence, validate its format, and assess its quality.

  • Detects presence at /llms.txt and /llms-full.txt
  • Validates Markdown structure against the specification
  • Checks that linked URLs return 200 status codes
  • Flags missing or duplicate H1 headings
  • Verifies AI crawlers are not blocked in robots.txt
  • Scores llms.txt quality as part of your overall AI-readiness score
  • Provides a generated llms.txt template for your domain

llms.txt in your broader AI SEO strategy

llms.txt is one component of a complete AI visibility strategy. Surfaceable checks it alongside:

JSON-LD schema

Structured data for agent parsing

AI crawler access

GPTBot, ClaudeBot, PerplexityBot

Entity consistency

Cross-platform brand coherence

AI mention tracking

Live LLM presence measurement

MCP tool exposure

Agent-callable APIs for your brand

Check if your site has a valid llms.txt.

Free audit covers llms.txt, robots.txt, schema, and 20+ AI-readiness checks.

Get started free