Disclaimer: The information in this post reflects the state of coding agents today. They’re evolving fast — what’s hard now might be trivial next month.


The Cliff of Death

If you’ve spent any time experimenting with coding agents, you’ve probably felt it — that point where everything seems to work until it suddenly doesn’t. The agent writes code, deploys it for you, maybe even fixes a few things… and then progress falls off a cliff. The codebase starts to decay. Small shortcuts compound into big problems. And instead of helping, the agent begins to quietly break things.

That’s the “cliff of death.” It doesn’t matter which agent you use — Lovable, Cursor, Claude, or something custom — they all hit it eventually. They each have their own prompts, their own internal teamwork structures, and yet they all stumble over the same underlying issue: they don’t close the loop.

You can read more about the cliff of death from Mikko’s perspective in his post AI-Assisted Code Generation: What is the Cliff-Of-Death, why does it appear — and How We Avoid It.


From Half-Loop to Full-Loop

Most agents today operate in what you could call a half-loop model. They make small code changes, and then the human developer tests, validates, and fixes what’s broken. The agent never really owns the result — it contributes, but it doesn’t ensure quality.

We’re defining the next logical step in the evolution — one that shifts the paradigm completely.


What Is Full-Loop Agentic Development (FLAD)?

Full-Loop Agentic Development (FLAD) is a software development paradigm where coding agents take responsibility for the entire development cycle, not just individual edits.

In FLAD, the agent plans its work, makes broader coordinated changes, writes and updates tests, and then validates everything through a strict, production-grade quality process before handing control back to the human.

Instead of relying on CI/CD pipelines to catch errors after the fact, the agent now acts as the CI/CD itself — embedding validation directly into its behavior.

The target mode of operation looks like this:

  1. Planning: The agent creates a plan of changes (across multiple modules and files) based on user goals.
  2. Implementation: It makes the changes with a focus on production-quality engineering standards — DRY, KISS, SOLID, SoC, and Composition.
  3. Validation: It performs a full validation loop before finishing:
    • Linting and static analysis.
    • Compilation and/or type-checking.
    • Unit tests, integration tests, and 100% E2E test coverage.
    • Code quality tools like knip, ts-prune, semgrep, etc. to remove dead code, detect smells, and enforce security and clarity.
  4. Definition of Done: The agent runs its own validation steps, detects most of its mistakes, and fixes them before handing control back to the developer.
  5. Commit and Handoff: Once validation passes, the agent can even commit its changes (think cbeams-style commits) — small, atomic, validated updates ready for review or merge.

This turns the agent from an assistant into a self-validating engineer — a system that merges the roles of developer, tester, and CI/CD pipeline into one intelligent loop.


Why This Matters

Traditional CI/CD is about continuous integration after code is written. FLAD moves the validation into the act of coding itself.

It gives the agent feedback loops that enable it to test and verify its own changes. So not just blindly making changes, but actually testing and verifying that they work.

That means fewer broken builds, fewer regressions, and cleaner repositories over time. It also means agents can safely make larger, more meaningful changes because they’re equipped to test and verify them comprehensively.

It’s not about removing CI/CD — it’s about absorbing its responsibilities into the development process. Anything we consider a best practice for CI/CD — from linting to test gating to quality metrics — now becomes part of the agent’s “definition of done.”


The Role of AGENTS.md

The cornerstone of FLAD is the AGENTS.md file — the agent’s operating manual.

Unlike a README.md, which explains things to humans, AGENTS.md defines the rules, constraints, and expectations for the agent itself. It’s not documentation — it’s configuration, policy, and governance.

It contains:

  • Quality gates the agent must pass.
  • The structure of the codebase.
  • Testing and coverage thresholds.
  • Static analysis, linting, and CI policies.
  • Coding principles (DRY, KISS, SOLID, SoC, Composition).
  • The “done” checklist for when a task is truly complete.

Every session, AGENTS.md is included in the prompt as part of its operational context. And every time the project structure changes, that file must be updated — compactly, precisely, and efficiently (no fluff, ideally under 300 lines).

You can read more about what the industry thinks about AGENTS.md over at agents.md.


Example AGENTS.md

### Prime Directives

- **Ship production-grade code.** Prefer clarity, performance, maintainability.
- **Never break main.** All changes must pass **Quality Gates**.
- **Make atomic commits.** Explain intent and risks clearly.
- **Follow the repo map.** Reuse common modules; never reinvent.
- **Update this file.** Keep repo map and thresholds accurate.
- **Read end-to-end before work.** When editing, update all relevant sections together.
- *Resolve placeholders immediately.* Any TBD/example/stack-agnostic choice (language, framework, runner, CI, tools) _must be replaced immediately_ once decided, and reflected across the file and included with the commit for the change.

### Coding Guidelines

- Apply **DRY · KISS · SOLID · SoC** rigorously.
- Prefer **composition**, **immutability**, and **pure functions** where practical.
- Use **modern language idioms**; concise, expressive syntax.
- Remove dead code and redundant abstractions.
- Name things precisely; module APIs small and orthogonal.

### Tooling (pick per stack, can run multiple)

    Stack-agnostic placeholders — *update immediately* when concrete tools are chosen. Keep this section consistent with Commands & CI.

- *Linters*: eslint/oxlint · flake8/ruff · golangci-lint · ktlint/detekt · rubocop · clang-tidy.
- *Static/Code Quality*: knip · ts-prune · semgrep · bandit · spotbugs · sonarlint.
- *Formatters*: prettier · black · gofmt · ktfmt · rubocop -a · clang-format.
- *Tests*: jest/vitest · pytest · go test · JUnit/TestNG · rspec · pytest + playwright/cypress/webdriver for E2E.
- *Coverage*: Istanbul/NYC · JaCoCo · Coverage.py · Go cover · kover · SimpleCov.
- *E2E*: Playwright preferred; otherwise Cypress/WebDriver. Use _data-test-id_ selectors.

### Minimum Thresholds (edit per repo)

    Thresholds are placeholders — *set concrete numbers* per repo and keep in sync with CI gates.

- *Unit*: ≥ 90% lines/branches on core logic.
- *Integration*: ≥ 80% covered for module seams.
- *E2E (journey)*: 100% coverage; all code needs to be tested like a user would use the app.

## Testing Strategy

- *Unit*: every public fn/class/module; deterministic; fast.
- *Integration*: thin tests for boundaries, IO, data-access seams.
- *E2E*: mandatory user journeys with headless browser; stub external services deterministically; record/play fixtures; include visual regression.
- *Flake control*: no retry masking; quarantine only with issue+owner+SLA.
- *Artifacts*: retain screenshots, videos, traces on failure.

### Regression Policy

- Any discovered bug → write a _failing test first_ (E2E or integration), then fix; keep the test permanently.
- E2E failures in CI _block merges_ until resolved or reverted.

### Quality Gates

1. Typecheck clean (if typed).
2. Lint: zero errors.
3. Build/Compile succeeds.
4. Tests: Unit, Integration, and **E2E** all green.
5. Coverage ≥ thresholds.
6. Static analysis (knip, ts-prune, semgrep, etc.) clean.
7. Preview smoke E2E test passes.

### Repo Map

/ (root)
  apps/ — entry apps
  packages/ — shared logic
  core/ — business logic
  common/ — utilities (documented at method level)
  services/ — domain services
  data/ — persistence
  ui/ — components
  e2e/ — E2E tests & fixtures

### “Done” Checklist

- Dev server starts.
- **Lint/Typecheck/Semgrep/Knip** pass.
- Tests (unit/integration/E2E) all green, fully covering the changes. No errors or warnings, even unrelated ones.
- Coverage at 100%, especially with e2e tests.
- `AGENTS.md` updated if relevant.
- Commit changes.

Relevant parts of the AGENTS.md file

Quality gates the agent must pass.

The quality gates make sure the agent is shipping production-grade code. It tests and verifies the changes it does to make sure it doesn’t break things and doesn’t presume things work one way when they don’t.

The structure of the codebase.

The structure of the codebase is important to have in AGENTS.md because it helps the agent immediately understand the codebase. It speeds it up when it doesn’t have to figure out the codebase structure every time. But it also makes sure it actually knows it and doesn’t presume things when making changes. The agents too often miss a common library or method and write duplicate functionality. And then we struggle communicating with it when it keeps on saying it made some changes, while we’re looking at the results of the duplicate implementation (that wasn’t changed).

Testing and coverage thresholds.

When the testing coverage thresholds are ridiculously high, the agent can be more aggressive in its changes. It can make more changes at once and we don’t have to worry about it breaking things. It’s quite common that when you’ve worked on the codebase for a while, the places you find broken things are in places the tests are not covering.

Coding principles (DRY, KISS, SOLID, SoC, Composition, etc.) and Static analysis, linting, and CI policies.

Coding principles and static analysis, linting, and CI policies are important because they help the agent produce high-quality code. High quality code is defined as something that makes humans produce code more effectively, and it shouldn’t be a surprise that it helps the agent produce code more effectively as well.

The “done” checklist for when a task is truly complete.

This is the glue between everything. It makes the agent actually do all these things. It will not stop before it has done everything.


Closing the Loop

The critical piece of FLAD is the validation step — the agent’s built-in “definition of done.” This is where it detects and fixes its own issues before finishing.

Once you have this in place, the agent no longer ships half-broken features. It learns to:

  • Fix its own compile or lint errors.
  • Keep test coverage intact.
  • Maintain architectural consistency.
  • Refactor safely without regressions.

That’s when the “cliff of death” vanishes — because the agent now operates as a self-contained feedback loop.


The Bigger Shift

For decades, we’ve built pipelines that validate human-written code. Now, we’re teaching the code itself — through agents — to validate what it produces.

FLAD is not the end of CI/CD, but its next evolutionary form: agent-integrated validation. It’s faster, more iterative, and (when done right) higher quality.

Agents that can plan, implement, test, and validate autonomously unlock a new layer of engineering — one that’s more about governance and intent, less about manual enforcement of standards.


Still Early, Still Learning

We’re still refining this. Agents misread AGENTS.md sometimes. They hallucinate missing files. But each iteration brings us closer to a working, self-correcting development loop.

If you’ve been experimenting with agents and hitting that same cliff — try this: write a compact AGENTS.md, define your “definition of done,” and make the agent enforce it.

You’ll be surprised how far that single file can carry you.


Conclusion

What experiences do you have with AGENTS.md? Have you hit the cliff of death or have you tried FLAD? Do you have any tips or tricks? Let us know in the comments below!