Inside that infrastructure, a toolkit of eight patterns.
The harness houses them; these are what do the work. Pick by problem shape, not
sophistication — each names the problem it fits, the one it doesn't, and where it
breaks. A 30-line script can do what a six-agent crew was built to do; the skill
is knowing which problem you have.
PATTERN 01
Scripts
The rule is writable, the inputs are stable.
The smallest unit of automation, and still the one that does the most work in most companies. A list of steps a computer follows without judgment — it runs the same way at 2 a.m. on a holiday as it does at noon on a Tuesday.
- WHERE IT FITS
- Repeated work with stable schemas: pulling a daily report, normalizing a feed, posting a webhook, calling an API on a cron. The first thing to try, almost always.
- WHERE IT BREAKS
- Brittle to schema drift, silent on edge cases. No judgment — that's the feature and the limit. If the next step needs “well, it depends,” a script alone is wrong.
PATTERN 02
Machine learning
The rule isn't writable, but you have data.
ML predicts, classifies, ranks, and scores. You reach for it when the input is messy, the rule is statistical, and you have labeled examples to learn from. The right tool when “if/else” runs out because the rule lives in the data.
- WHERE IT FITS
- Demand forecasting, fraud scoring, churn prediction, defect classification, anomaly detection, ranking, route optimization — a number or a label that depends on patterns too tangled to hand-code.
- WHERE IT BREAKS
- Needs data infrastructure, labeling discipline, monitoring, and a retraining cadence. And an LLM is the wrong tool here — a logistic regression beats a chat completion at predicting a number, every time.
PATTERN 03
Visual workflows
The bottleneck is integration breadth, not reasoning depth.
Node-based tools like n8n: a visual canvas, a thousand-plus connectors, LLM nodes built in. The graph is editable by anyone who can read it. When the work is moving things between systems, it's hard to beat for time-to-first-win.
- WHERE IT FITS
- Cross-system glue: lead routing, ticket dispatch, daily report assembly, file moves, notification chains, CRM hygiene. The litmus test — “take this from here, do something simple, notify someone.”
- WHERE IT BREAKS
- Long-running stateful work, deep agent loops, complex retries, anything that needs version-controlled tests and CI. Past a hundred nodes or shipping as a product, code-first wins.
PATTERN 04
Code-first graphs
The agent loop has to ship as a product.
LangGraph and the like: stateful, multi-agent workflows expressed as a directed graph. Each node mutates shared state, interrupts pause for human review, and the whole graph compiles, persists, streams, and replays — with first-class tracing and evaluation.
- WHERE IT FITS
- Production agent systems that must be reliable, testable, and ownable by an engineering team: a research agent, a support triage that routes and resolves, a document generator with a human checkpoint. Anything needing unit tests, eval suites, and a deploy pipeline.
- WHERE IT BREAKS
- Costs developer hours up front. No canvas, no drag-and-drop. If your team won't write code — or the work is mostly moving data between SaaS apps — the wrong tool. Visual workflows are faster.
PATTERN 05
Skills
You need to teach the model “how we do this here.”
A Skill is a folder with a playbook inside. The model loads it only when the task matches — same model, more capable on the things you teach it. Published as an open standard, so the same skill works across Claude Code, Codex, and other harnesses.
- WHERE IT FITS
- Encoding procedure: a brand voice, a document template, a regulatory checklist, a deployment runbook, a customer onboarding sequence. Skills compound under version control — the library gets better, the model gets better, no retraining required.
- WHERE IT BREAKS
- Skills are recipes, not the kitchen. If the task needs a model, a tool, or external data, the skill is only part of the system. A poorly written description means the skill never loads — the most common failure.
PATTERN 06
Skill orchestration
One identity needs to run many procedures.
A workflow of skills, run under a stable identity, is an agent with a job. The identity is the role, the principles, the voice. The skills are the procedural library. The orchestration decides which skill, in what order, under what conditions.
- WHERE IT FITS
- Roles, not tasks: a “lead qualifier” that always sounds like your sales team and routes to a human at the right threshold; a “release manager” that follows the same checklist every time; a “client onboarder” that runs the same skills in the same order.
- WHERE IT BREAKS
- Lives or dies on the quality of the identity and the skill descriptions. A vague identity drifts; vague descriptions load the wrong skill. Treat the identity as a real artifact — write it, version it, review it.
PATTERN 07
Agent harnesses
The model needs hands, not just a mouth.
A harness is everything around the model: filesystem access, code execution, a permission system, hooks, subagents with isolated context, tool integration, durable session state. Without one, an LLM is a chat box. With one, it's an operator that can read files, run tests, edit code, and hit APIs across hours of work.
- WHERE IT FITS
- Any work that requires real action — editing files, running tests, calling APIs across long sessions, coordinating subagents, enforcing policy on what the model may do. The harness is what lets a model finish a job rather than describe one.
- WHERE IT BREAKS
- Build versus adopt is the real decision. Off-the-shelf harnesses give you the ecosystem for free; building your own gives full control of the lifecycle. Most operators should adopt; some should fork; few should build from scratch.
PATTERN 08
Retrieval
The answer lives in your data, not the model.
The model knows what it was trained on; your business knows what it wasn't. Retrieval is the bridge — saved knowledge made available on demand, in the form most useful to the question. RAG, grep, semantic search, hybrid, structured queries, knowledge graphs.
- WHERE IT FITS
- Choose retrieval shape by question shape: grep for an exact term or identifier, vector for a conceptual match, hybrid when both matter (the common production answer), a knowledge graph when the value is in relationships.
- WHERE IT BREAKS
- Bad chunking, stale indexes, missing citations, no evaluation. Retrieval is the part of an AI system most likely to silently give the wrong answer — the model will happily reason on top of irrelevant context. Treat retrieval quality as its own metric.