//nefariousplan

Idle Indistinguishable From Broken

A periodic process that emits identical success telemetry for "I checked and there was nothing to do" and "I checked and could not determine what to do." Observers cannot tell health from breakage; risk accumulates silently.

A polling job emits a green checkmark. The reader sees: "I checked, there was nothing to do." What it might also mean: "I checked, I have no idea where I am, the next iteration will reproduce this exact condition forever." Both are encoded as the same return value. Both render as the same dashboard tile. The pager doesn't fire on either.

Idle and broken are not the same state. When the telemetry compresses them into one signal, every observer downstream inherits the conflation, and the system silently accumulates risk against whatever the process was supposed to be defending.

Mechanism

The pattern lives in the return type, not in the work. A periodic process has three structural outcomes: it did something, it had nothing to do, or it could not determine what to do. The first is success with effect. The second is success without effect. The third is a failure dressed as the second. Most polling jobs are written by engineers who only thought about the first two.

The third state shows up at the boundary: when an upstream service is unreachable, when a config the script depends on is malformed, when the precondition that would let the script know its own state is missing. The instinct is to log the failure and continue. Continue with what value? With "no work to do." That sounds harmless. It looks like the success-without-effect case. The wire format is identical. The next observer down the pipe (the dashboard, the cron logs, the on-call rotation, the audit committee) has no way to tell.

The compounding is structural, not careless. Every layer downstream optimizes for the most common case, which is success without effect. Alerting is calibrated to fire on consecutive failures, but unable-to-determine doesn't render as a failure. Dashboards summarize green-or-red, and unable-to-determine renders as green. Reviews look at exceptions, and unable-to-determine isn't an exception. The risk that was supposed to be defended against accumulates outside the visibility of every system that was supposed to catch it.

When the process is a security control (a daily upstream sync, a credential rotation, a backup, a cert renewal, a scanner) the accumulating risk is the bug that control was deployed to prevent. The control is not failing closed. It is failing into the success bucket. The longer the condition holds, the further the system drifts from where it would be if the control were working, and the louder the eventual surprise.

Exhibits

CVE-2026-42897: EOMT Deploys to Outbound Rules. Health Checker Reads Inbound.. Health Checker's IIS section enumerates URL Rewrite rules by walking .rewrite.rules at three configuration sources (web.config, the per-location entry in applicationHost.config, and global system.webServer). It never walks .rewrite.outboundRules. The EOMT mitigation for CVE-2026-42897 deploys exclusively to .rewrite.outboundRules, because a CSP response header has nowhere else in IIS rewrite to live. After running EOMT, an admin's Health Checker report says EOMT OWA CSP - outbound: NOT DETECTED. The same report appears on a server that never ran EOMT. Where Seventeen Green Checkmarks instantiates the pattern in a polling cron's return type, this CVE instantiates it in an audit tool's enumeration wire format. The structural mistake is identical: no field exists for "I did not enumerate this collection," so success-without-effect and could-not-determine collapse into the same dashboard tile. The KEV deadline pressure compounds the failure, because admins on the May 29 clock are reading the report to confirm work they have already done.

Seventeen Green Checkmarks. The exemplar. Seventeen daily runs emitted "Up to date with upstream" while git subtree pull --squash was failing to find its squash base. Both the truly-up-to-date case and the could-not-find-base case returned {merged: false, behind: 0}. The pattern lives in the return shape, not in the merge.

Boundaries

Not every idle cron is broken. Polling jobs that have legitimate idle states are common and correct. This pattern names the structural mistake of letting unable-to-determine collapse into the same wire shape as nothing-to-do, not the existence of idle telemetry.

Not solved by adding more logs. Verbose logging is where this pattern hides. The structured return value is the same; somewhere in the log stream a line says "subtree pull failed," but the dashboard never reads it. The fix lives in the wire format, not in human-readable prose downstream of it.

Not every script can avoid this if its dependencies do not cooperate. Some upstream tools (git subtree, certbot, package managers) return non-zero on conditions the wrapper wants to treat as recoverable, and the wrapper has no clean way to distinguish "recoverable miss" from "the tool itself is broken." The pattern still applies; the fix is to refuse to run instead of guessing.

Defender playbook

Distinguish unable-to-determine in your return type. A polling process should have at least three values, not two. {ok: true, work: <count>}, {ok: true, work: 0}, and {ok: false, reason: "baseline_unreachable"} are different shapes. The dashboard, the alerting rule, and the next-iteration precondition all need to be able to read the difference.

Treat unable-to-determine as a paged failure, not as a green pixel. If the process cannot tell whether it has work, alert. If unable-to-determine fires repeatedly without an operator waking up, the alert is not actually doing the job; lower the threshold or change the channel until somebody sees it before the seventeenth iteration.

Track explicit baseline state in the artifact itself, not in implicit runtime state. A baseline file, a watermark row, a canary check at the start of every run. The canary is what fails loud when the script does not know where it is. Without a canary, every run is reasoning from absence; with one, the failure has a name and a place to land.

Audit your daily-checked things on the day they last did real work. Pick the cron that runs most boringly. Look at its structured return values for the last 30 runs. If "no-op success" and "broken" have the same shape, you found one of these. Fix it now, not after the bug it lets through.

Kinship

Security Metric Theater. Both name a signal that compresses two distinct states into one number, tile, or dashboard cell, optimized for the audience that signs the budget rather than the audience that uses the data. Idle Indistinguishable From Broken is the engineering shape of the failure, living in a return type or a status code. Security Metric Theater is the organizational shape, living in a coverage ratio or compliance attestation. Same mistake at different layers: the report serves the easy reader, the gap serves the adversary.

Persistent Blindspot. Persistent Blindspot is about an adversary inducing a blindspot in defender detection. Idle Indistinguishable From Broken is the same blindspot induced by a return-type design choice with no adversary in the loop. Both end up in the same place: a class of events the defender cannot see, accumulating until something else forces the issue. The first is exploited; the second is volunteered.

Revocation Gap. Revocation Gap names the time between credential compromise and detection, treating that interval as a damage multiplier in its own right. Idle Indistinguishable From Broken creates a structurally analogous gap between control breakage and detection. Both patterns argue that the elapsed time is the story, not the eventual fix; the difference is which artifact emitted the misleading green signal during the gap.

What the cron cannot say, the postmortem will.