Why Coding Agents Can’t Replace ML Systems Engineers (Yet)

Community Article Published January 28, 2026

And How to Work With Coding Agents

I was recently asked:

Why can’t Claude Code do your job?

I tried to remember how I was actually working with a coding agent the last time I was building an LLM training system—where it struggled, and when it genuinely helped. :contentReference[oaicite:1]{index=1}


1. Why coding agents are not enough

ML systems are not just software. They sit at the intersection of ML algorithms, software, and hardware—and the behavior of the system emerges from how these layers interact at runtime.

When something goes wrong, reading the code is rarely sufficient. The same code can behave very differently depending on GPU topology, other parallelism strategies, communication patterns, and workload scale. Bottlenecks often come from interactions: compute waiting on communication, hidden synchronizations, memory pressure triggering stalls, or CPU-side control paths blocking GPU execution.

Coding agents operate on static text. But ML systems problems live in dynamic behavior. Without observing real execution—traces, timelines, performance counters—there is no reliable way to infer what the system is actually doing. :contentReference[oaicite:2]{index=2}


2. What is missing from current coding agents

What’s missing is a world model of the entire complex ML system.

A useful ML systems world model spans the full stack: training algorithms, all parallelism strategies, low-level kernel implementations, hardware execution, and their failure modes. It encodes not just how code is written, but how systems behave under scale, noise, and interference.

Current coding agents do not see real-world GPU timelines, failing to form hypotheses from runtime signals. They cannot connect a small piece of code to its downstream effects on latency, throughput, memory, or stability. Without interacting with the real system and understanding the whole system, they lack the feedback loop required to build this model.

As a result, they can optimize locally, but they cannot reason globally. :contentReference[oaicite:3]{index=3}


3. How I actually get help from coding agents

In practice, coding agents are effective only after the hard thinking is done.

My workflow starts with understanding system behavior conceptually: inspecting traces, identifying bottlenecks, understanding the whole system behavior, and forming a clear hypothesis about what needs to change and why. I spent 40% of the time replaying the solution in my mind and making sure the solution is conceptually clear.

Then, I spent another 40% of the time writing down a 99% precise prompt to describe exactly what to implement, where the change fits in the system, and what constraints must be preserved. The agent executes these instructions well—refactors, rewrites, plumbing, etc.

If I cannot describe the problem precisely, the agent cannot help. This is not a limitation of code generation, but a consequence of missing system-level understanding.

Coding agents are powerful amplifiers. But in ML systems, the core work is building the world model that makes the problem legible in the first place. Until then, the coding agent cannot truly do my job :) :contentReference[oaicite:4]{index=4}

Community

Sign up or log in to comment