Opening
A concise route through fundamentals, harness design, and evaluation.
The Classic IDE
Any AI-native IDE that misses the basics is already disqualified.
| Capability | Why it matters |
|---|---|
| Speed | Response time must not break cognitive flow. |
| Navigation | Search, symbol jump, and file movement must stay effortless. |
| Execution | Tasks, launch configs, and direct runs must be first-class. |
| Debugger | Breakpoints, watches, and step-through fidelity must be reliable. |
Architecture Shift
The model reasons, but the harness controls context, tools, state, and memory.

| Layer | Role |
|---|---|
| IDE / Harness Engine | Shapes prompts, manages tools, memory, session state, and safety boundaries. |
| Agent | Runs the decision loop: read, ask, edit, execute, verify. |
| LLM | Provides reasoning and generation over the supplied context. |
Evaluation Lens
Evaluate the system through coherent dimensions.
| Code | Label | Prompt | Target |
|---|---|---|---|
| A | Session & Context | What does it know? | session-context |
| B | Control & Customization | Who is in charge? | control-customization |
| C | Safety & Observability | Can you trust it? | safety-observability |
| D | Extended Capabilities | What else can it do? | extended-capabilities |
Group A
If context quality collapses, the agent hallucinates, repeats mistakes, and loses continuity.

| Capability | Why it matters |
|---|---|
| Live Repo Context | Reads Git state, structure, README, and uncommitted changes. |
| Prompt Caching | Reuses stable prompt layers instead of rebuilding every turn. |
| Context Compaction | Compresses without dropping key decisions when windows fill up. |
| Session Resumption | Recovers working memory across pauses, crashes, and next-day restarts. |
Group B
Autonomy without runtime control is risk. Customization turns the agent from generic to team-native.
Group B · Runtime Sovereignty



Group B · Teaching the Agent


Group C
Agents write files, run commands, call APIs, and inspect secrets. Trust requires both enforcement and traces.
| Capability | Why it matters |
|---|---|
| Tool Controls | Per-tool permissions and approval gates for destructive or networked actions. |
| Trace Navigation | Lets you inspect hidden context and the exact point reasoning diverged. |
| Isolation | Runs in sandboxes, containers, or remote environments when needed. |
| Debug Logs | Captures tool calls, prompts, responses, and chronology for audits. |
| Prompt Injection Protection | Sanitizes hostile content from comments, tools, and external responses. |
| Behavioral Correction | Turns findings into persistent rules without resetting all progress. |
Group C · Safety




Group C · Observability



Group D
The strongest harnesses extend beyond the editor into workflow orchestration.
| Capability | Why it matters |
|---|---|
| Spec-Driven Development | Translate intent into requirements, design, tasks, and implementation. |
| Agent-First Manager UI | Track multiple parallel workflows from a mission-control view. |
| CLI + Cloud Agents | Invoke from scripts and run work that outlives the local session. |
| Browser / CI / Test Automation | Close loops outside the file editor and pull results back in. |
Group D · Extended Features




Evaluation Method
There is no substitute for X honest hours in your own codebase with your own constraints.
The tool is not the problem. Getting work done is the problem.
| Step | Action |
|---|---|
| 01 | Pick a representative multi-file task. |
| 02 | Use the same prompt and a fixed time box in every IDE. |
| 03 | Track context fidelity, correction rate, and control. |
| 04 | Score harness quality, not code generation metrics. |
Closing
Every major IDE can access strong models. Durable advantage now comes from harness quality.
Choose the IDE that gets out of your way fastest while keeping you firmly in control.
Thank You
Questions, feedback, or just want to talk shop?
Find me on GitHub: github.com/vanduc2514