OpenAI shipped code to production with zero manually-written code.

7 engineers · 5 months · 1M AI-generated code · 10× faster

Harness
Engineering

The discipline that made it possible

Yan Cheng · March 6, 2026

Every tech movie ever...

Classic movie hacking scene with progress bar

Scrolling logs. Progress bar. One person watching.

We're actually getting there.

Agents, humans, and the harness

THE HARNESS (ENVIRONMENT) WHERE AGENTS LIVE AND ENGINEERS MAINTAIN read + write observe + act AGENTS GENERATE EVERYTHING Agent struggles? HUMANS — PROACTIVE Repository — Single Source of Truth AGENTS.md Architecture Conventions Specs/Criteria Enforce invariants, not implementations Tooling for Self QA DevTools MCP Logs/Metrics Linters Traces Agents validate their own work Product code Tests CI/CD + Tools Documentation Reviews + Linters Open PRs → self-review → agent-to-agent review → update → merge Garbage collection — periodic scans for drift, violations → auto-fix PRs iterate agent self-fix loop Design systems — linters, GC, structural tests Define intent — specs, criteria → .md files Prioritize work — decide what matters next build + update harness Identify gap → agent implements the fix escalate Diagnose gaps Missing tool / guardrail / doc? → improve harness improve harness

Trial → error → better harness

1000-line AGENTS.md
~100-line map · discover on demand
Manual cleanup Fridays
Automated GC · agents scan + fix drift
Context in Slack & Docs
Everything in the repo · single source of truth

"When the agent struggled, they didn't try harder — they encoded what was missing."

Transforming to harness engineering?

Changed skills and requirements from engineers

Environment Design
Structure repos + tooling
for agent navigation
Prompt Decomposition
Break goals into
agent-tractable units
Feedback Loops
Linters that teach fixes
Tests that enforce architecture
Gap Diagnosis
Agent fails? Find what's
missing, then encode it

Transforming to harness engineering?

How to get there

For engineers

Mindset shift — more architectural thinking
Start small — cleanup, tests, docs
Learning and sharing
Debugging skills transfer directly

What we need from leadership

Celebrate environment design wins
Tooling time = product time
Expect a slow start — it's investment
Track how improvements reduce future failures

Let's be honest

Long-term coherence
5 months proven.
Years? Unknown.
Human leverage points
Where does our judgment
add the most value?
Model capability growth
Today's constraints:
unnecessary or critical?

Discipline hasn't disappeared — the hard work shifted from writing code to designing the systems that make agents reliable.

Where do we start?

Bring knowledge into the repo

It may live in Confluence, Teams, or someone's head

Make existing tools agent-readable

Monitoring, CI, linting — we have them, now make them accessible to agents

Pick a low-stakes pilot

Internal tooling, test generation, doc updates

Treat every agent failure as a system problem

Don't blame the model — fix the environment

Want AI work reliably for us?
But first, we need to work for AI
through better context, better scaffolding, better environments.
That's harness engineering.

"Our most difficult challenges now center on designing environments,
not expecting better models to solve everything."

— OpenAI Codex Team

My journey toward harness engineering

Building context engine for data-workspace

DnA — making our repo agent-navigable with structured context

PoC for session recording

Work-related — exploring agent-assisted capture and replay

Production web app — zero manual code

Hobby project: Danish exam prep app, fully AI-generated

danskprep.vercel.app