agentdelta
See what behavior you actually shipped — before your users do.
Behavioral version control for agents — a semantic regression diff for every change.
Change a model, prompt, tool schema, or memory policy and you have no principled way to know what behavioral change you just shipped. Surface metrics stay flat while the agent quietly starts refusing an edge case or degrading on a 5% slice. Provider models now update faster than your release cycle and shift behavior across whole task categories at once.
The unit of comparison is the behavioral trace graph, not the final answer. Normalize two stochastic runs into comparable graphs so genuine regression separates from sampling variance, then surface the first decision fork where they diverge — packaged as a human-readable diff a developer reviews like code.
OTel-native capture
Ingests agent execution traces over OpenTelemetry — no bespoke instrumentation, drops into existing stacks.
Semantic graph diff
Normalizes two runs into structurally comparable trace graphs and isolates policy-relevant divergence from chaos.
Fork detector
Flags the exact node where behavior changed given the same context — the causal chain, not an aggregate number.
PR review gate
A GitHub Action that posts the behavioral diff on the pull request and can block deploys on regression budgets.