salluru.dev
operational

All systems active

Labs / Verification & Trust
Live

agentdelta

See what behavior you actually shipped — before your users do.

Behavioral version control for agents — a semantic regression diff for every change.

At a Glance
OTel
native
trace ingest
graph
diff engine
GitHub
Action
PR
review gate
The Problem

Change a model, prompt, tool schema, or memory policy and you have no principled way to know what behavioral change you just shipped. Surface metrics stay flat while the agent quietly starts refusing an edge case or degrading on a 5% slice. Provider models now update faster than your release cycle and shift behavior across whole task categories at once.

Key Insight

The unit of comparison is the behavioral trace graph, not the final answer. Normalize two stochastic runs into comparable graphs so genuine regression separates from sampling variance, then surface the first decision fork where they diverge — packaged as a human-readable diff a developer reviews like code.

How It Works

OTel-native capture

Ingests agent execution traces over OpenTelemetry — no bespoke instrumentation, drops into existing stacks.

Semantic graph diff

Normalizes two runs into structurally comparable trace graphs and isolates policy-relevant divergence from chaos.

Fork detector

Flags the exact node where behavior changed given the same context — the causal chain, not an aggregate number.

PR review gate

A GitHub Action that posts the behavioral diff on the pull request and can block deploys on regression budgets.

VerificationObservabilityCI/CD