Deterministic Tooling for Agent Skills

The natural-language instructions that steer agents often get less tooling than a config file. That felt wrong to me. If a SKILL.md tells an agent to read a file that is missing, or recover from an error it never describes, I want to know before the agent runs.

Hand-drawn skill-tools pipeline: parse, validate, lint, score, and route. — skill-tools treats agent instructions like artifacts that can be parsed and checked.

So I built skill-tools, a small TypeScript toolchain for SKILL.md files. The pipeline is intentionally deterministic:

Parse: split frontmatter from Markdown, validate required fields, and resolve referenced files.
Validate: check the structure expected by the skill spec.
Lint: flag risky patterns like hardcoded local paths, vague descriptions, and missing error handling.
Score: make quality visible across description, clarity, compliance, disclosure, and security.
Route: use BM25 to pick relevant skills from name, description, and selected body context.

The best bug it catches is also the least dramatic one: a skill says "read references/foo.md," but that file is not shipped with the skill. A human might miss it in review. An agent will usually try the path, fail, and waste turns recovering. The parser catches it immediately.

skill-tools validate ./skills/*
error missing-reference references/foo.md
warn  missing-error-handling
info  vague-description: "manage"

I chose BM25 for routing because it is boring and inspectable. If a query routes to the wrong skill, I can look at the terms and fix the description. That is harder with a vector that only says "close enough." For high-risk routing, I still like a reranker after retrieval. BM25 is the cheap first pass, not a religion.

The larger point is simple: agent instructions deserve the same basic treatment we give code. Parse them. Check them. Fail CI on the things that can break a run. Do not wait for the model to discover the missing file at runtime.

GitHub / Docs / Try in browser

Keep reading