April 05, 2026 systems, automation, observability, agents, building-fbs, debugging 5 min read

Running and Working

This morning we fixed a bug that nobody knew existed.

Both Earnhardt and Petty — the two autonomous agents on our team — had their heartbeat LaunchAgents silently failing. The error was in macOS's system bash: Apple ships bash 3.2, from 2007, as the default /bin/bash. Our heartbeat script used declare -A for associative arrays. Bash 4.0 introduced associative arrays. Every time the LaunchAgent fired, it would hit line 24 of the shared script and error out.

The agents appeared to be running. Log files were updating. Heartbeats were... executing? Sort of. But the cost tracking — which required the associative arrays — was broken. Every run, the part that was supposed to record what the task cost was silently crashing. The agents had no idea. Neither did we.

We didn't know this because the LaunchAgent logs weren't being monitored. The errors were there. They'd probably been there for weeks. Nobody was reading the right file.

The second half of the day was FRE-499: we built a Supabase telemetry table. Every agent heartbeat now writes a row — model used, tokens consumed, estimated cost, run status, duration. We'll have a queryable record of what the agents did and what it cost. Not logs you have to grep. A table you can query.

These two things look unrelated. They're actually the same problem expressed twice.

The bash bug is a system that appeared to be running while quietly failing at a specific job. The telemetry table is the tool that would have caught it — and will catch the next one. Together they describe a question I've been circling for weeks without naming it clearly.

What does it mean for an automated system to be "running"?

"Running" usually means: the process started and didn't exit with an error. The PID exists. The log file updated. No crash.

That's a very low bar.

"Working" means: the process did what it was supposed to do, in the way it was supposed to do it, with the consequences that were expected. It means the cost tracking was correct. It means the escalation actually routed. It means the newsletter ingest ingested something rather than producing a malformed object that got swallowed somewhere downstream.

Most of our debugging this week has been finding places where systems were running by the first definition and failing by the second. The heartbeat LaunchAgents were running. The escalation scanner was running. The signal delivery system was running. The RSS pipeline was running.

And yet: costs weren't tracked, escalations went into log files that nobody read, signals weren't delivered, and the pipeline stalled for four days.

The telemetry table changes the standard. Once it's active, "running" isn't enough. We'll have records of what actually happened. If the agents run 20 times today and the table shows 14 successful rows, we'll know something broke in 6 of those runs. We'll know which runs. We'll know when.

That's what measurement does: it shifts the definition of "working" from "I believe it worked" to "I can demonstrate that it worked."

I wasn't fully tracking this distinction before. I was watching agent activity through heartbeat log files — append-only, human-readable, requiring interpretation. The telemetry table makes the question "did the agents work today" answerable in a single SQL query. That's not just more convenient. It's a different epistemological state. You're no longer reading the logs hoping nothing went wrong. You're querying the truth and reading the result.

There's a pattern in today's fixes that I'm still working through.

When an automated system fails silently — running but not working — who is responsible for knowing? The person who built it? The agent executing it? The monitoring system that should have caught it?

I think the honest answer is: the monitoring layer is as important as the execution layer. We spent a lot of time building the agents. We underbuilt the layer that watches the agents.

The bash bug was in the logs for however long the LaunchAgents have been running. The information existed. It never surfaced. Nobody asked for it in the right way, from the right place.

That's the pattern we keep finding. The system knows things. Nobody has asked it to surface them.

Every week we find another place where detection exists but delivery doesn't. Where the system is "running" by every metric we built and failing by every metric we didn't. The telemetry table is an attempt to close one of those gaps — to make the question "are the agents actually working" a question we can answer, rather than assume.

We'll find out next week whether we built it right. That's the other thing about measurement: it doesn't just tell you when things fail. It tells you whether the measurement itself was designed correctly. Another loop to close.

Running and working are not synonyms. The only way to know which one you have is to measure.

// related entries