Context Engineering > Prompt Engineering

The most important skill in AI engineering isn't crafting prompts anymore. It's engineering context. Andrej Karpathy called it "the delicate art and science of filling the context window with just the right information." Tobi Lutke defined it as "the art of providing all the context for the task to be plausibly solvable by the LLM." The terminology varies, but the shift is real: what the model knows matters more than how you ask.

Building context infrastructure across BAP for browser observation, SKILL.md for agent instructions, and production systems that route context to models has reinforced one pattern: improving what the model knows matters more than improving how you ask.

Context window budget: system prompt, tool definitions, conversation history, current task, reserved output. Routing mode uses ~100 tokens per skill. Execution mode loads full body for selected skill only.

The distinction

Prompt engineering is writing a better question. "Summarize this document in 3 bullet points, focusing on financial implications, using formal tone." It's optimizing the instruction.

Context engineering is assembling the right information before the question gets asked. Which documents are relevant? What does the user's history tell us about their intent? What tools are available? What constraints apply? The prompt is the last 5% of the context window. The other 95% is the context, and that's where the leverage is.

The term "prompt engineering" has become synonymous with typing things into a chatbot. That's reductive. "Context engineering" better captures the actual complexity: task descriptions, few-shot examples, RAG results, tool definitions, conversation history, and state, all competing for space in a finite context window.

The economics of context

Context windows have exploded. As of early 2026, Claude Opus 4.6 and Sonnet 4.6 support 1M tokens, roughly 750,000 words. Haiku 4.5 supports 200,000 tokens. But bigger windows don't make context engineering easier. They make it harder.

The reason: cost scales with context length. Anthropic's pricing at the time of writing:

Model	Input (per MTok)	Output (per MTok)	Cost to Fill 1M Context
Claude Opus 4.6	$5	$25	$5.00
Claude Sonnet 4.6	$3	$15	$3.00
Claude Haiku 4.5	$1	$5	$0.20 (200K max)

Five dollars per request to fill Opus's context window. If you're making 1,000 requests a day, that's $5,000 in input tokens alone, before a single output token. The financial incentive to put less in the context window is enormous. But the quality incentive to put the right things in is even bigger.

Context engineering is the discipline of maximizing signal-per-token. Not "fit more in." Not "use the biggest window." Put exactly what the model needs, nothing more, and structure it so the model can find what matters.

MCP as context infrastructure

The Model Context Protocol is context engineering at the infrastructure level. It standardizes how AI applications connect to external data sources, tools, and workflows. Build one MCP server, provide context to Claude, ChatGPT, VS Code, Cursor, or any other client that speaks the protocol.

But MCP has a context cost that most people don't account for. Every MCP tool registered with a model adds to the context window. Tool definitions, parameter schemas, descriptions: they all consume tokens on every request, whether the tool gets used or not.

SKILL.md: progressive disclosure as context engineering

The core insight behind SKILL.md: an agent doesn't need all the information upfront. It needs just enough to decide whether a skill is relevant, then full details only when it commits to using it. Frontmatter is the advertisement. The body is the manual. A router reads the advertisement. The agent reads the manual.

Stage	What's Loaded	Typical Token Cost
Routing	Name + description (frontmatter only)	~50-100 tokens per skill
Execution	Full SKILL.md body	~500-3,000 tokens per skill

Compare this to MCP, where every registered tool's full schema sits in the context window on every request. With 20 MCP tools, you're paying ~550-1,400 tokens per tool on every call, totaling 11,000-28,000 tokens before the user's actual request. With progressive disclosure, 20 skills cost ~1,000-2,000 tokens for routing, and only the selected skill's full body gets loaded.

BAP: structured observation as context

When I built Browser Agent Protocol, the key design decision was how to represent page state. The obvious approach: take a screenshot and let the model use vision. Computer Use APIs do this. It works, but it's expensive in tokens (images are large) and in latency (~500ms per action for vision processing).

BAP takes a different approach. After every action, it returns the page state as structured data: what elements exist, their roles, their text content, what's clickable, what's visible. This is context engineering applied to browser automation. Instead of giving the model an image and asking "what do you see?", BAP gives the model a structured representation and says "here's what exists."

The difference in context efficiency is dramatic. A screenshot of a typical web page consumes thousands of tokens through the vision API. BAP's structured observation of the same page (roles, text, interactive elements) fits in a few hundred tokens. Same information, 10x fewer tokens, and the model doesn't need to interpret pixels.

This is what context engineering looks like at the protocol level. The question isn't "how do I ask the model to understand this page?" (prompt engineering). The question is "what representation of this page gives the model the most useful information per token?" (context engineering).

Five principles for context engineering

These are the principles that have held up across every context system I've built:

1. Budget tokens like money. Every token in the context window has a cost, both financial (API pricing) and cognitive (model attention degrades with context length). Track your token budget per request. Know where the tokens go. Optimize the biggest line items first.

2. Load context progressively. Don't front-load everything. Start with the minimum the model needs to route or triage. Load details on demand. SKILL.md's frontmatter-then-body pattern works for any domain: show the index first, load the chapter when needed.

3. Prefer structured over unstructured. A JSON object with labeled fields is easier for a model to extract information from than a paragraph of prose containing the same information. BAP returns structured observations instead of screenshots. MCP provides structured tool definitions instead of natural language descriptions. Structure is compression.

4. Cache aggressively. Anthropic's API supports prompt caching for up to one hour. If your system prompt, tool definitions, or reference documents don't change between requests, cache them. Cached tokens cost a fraction of fresh tokens. Claude Opus 4 and Sonnet 4 models support this natively.

5. Measure context quality, not just context size. More context isn't better context. The Agentless paper showed you can solve SWE-bench at $0.70 per issue with minimal context, just localization results fed to a repair step. Meanwhile, agent systems stuffing entire codebases into context windows spent $5-20+ per issue and performed worse. Signal-to-noise ratio matters more than signal volume.

The shift

Prompt engineering was the right skill for 2023. You had a chat interface, a single turn, and a question. Crafting that question well mattered.

Context engineering is the skill for 2026. You have agents with tool access, MCP servers, retrieval systems, conversation history, and context windows measured in millions of tokens. The question is almost an afterthought. What matters is the information ecosystem around the question: what the agent knows, what it can look up, what it loads upfront vs. on demand, and how efficiently all of that is encoded.