SysEdge — ontological knowledge graph for multi-agent Claude Code teams

The graph that catches what code review misses.

A defect slipped through because the requirement was never written down, the test was never linked, and the architecture standard was never addressed. SysEdge makes all three visible — in a queryable graph, before the code ships. Measured on a real external open-source codebase you can reproduce: 71% fewer orientation tokens. Twelve parallel Claude sessions, one shared ontology, zero duplicate implementations.

Get SysEdge → How it works SysEdge vs alternatives

Token savings

71% fewer orientation tokens

Measured on Formbricks — a real external TypeScript project, actual API tokens, cloned cold from GitHub. Reproducible.

Requirements traceability

Gaps in the spec, before the code

AI-powered AS-REQ analysis surfaces missing exception paths, untestable acceptance criteria, and PII handling gaps — before a session starts coding.

Test coverage

V-model shape, not a percentage

Four tiers mapped to specification artefacts. The graph shows whether the export feature has unit tests, API tests, a UI flow test, and an end-to-end journey — separately.

Architecture standards

Standards as nodes. Gaps as queries.

53 architecture standards across security, operations, and infrastructure. Standards with no addressing ADR are gaps — returned by a single CLI command.

Requirements quality

Bad stories caught before any code is written

Automated QUS checks (Lucassen et al. 2016) flag multi-goal stories, missing actors, non-testable criteria. Cockburn UC guidance and traceability enforcement run free, without an AI key.

Requirements quality — before tests, before code

Bad requirements produce bad code reliably. SysEdge ships a quality-review command that evaluates every user story and use case against a named set of research-backed standards — automatically, from graph queries, without an AI API key. The structural checks (13 QUS criteria, 6 traceability rules) run in seconds. The AI guidance layer (Cockburn UC patterns, semantic story quality) runs when you want it.

$ python3 cli/sys_edge.py quality-review --entity US-CTM-010 quality-review: US-CTM-010 (SysUserStory) [PASS] TRC-001 Every UserStory has at least one UseCase [PASS] QUS-003 Minimal — no embedded AC or implementation detail [PASS] QUS-004 Full sentence [FAIL] QUS-001 Well-formed → Missing 'As a [role], I want [means]' structure [FAIL] QUS-002 Atomic → Goal may combine multiple objectives (conjunction detected) [MANUAL] QUS-007 Conceptually sound [MANUAL] QUS-008 Problem-oriented [AI] QUS-G-001 Guidance: Is this story conceptually sound? (run with --ai) [AI] UC-G-003 Guidance: At least one exception flow documented? (run with --ai) PASS 3 FAIL 2 MANUAL 5 $ python3 cli/sys_edge.py quality-review --entity UC-DLG-001 quality-review: UC-DLG-001 (SysUseCase) [PASS] TRC-002 Every UseCase has a parent UserStory [PASS] TRC-003 Every UseCase REQUIRES at least one Feature [PASS] TRC-UC-003 Every UseCase has at least one linked SysTest [AI] UC-G-001 Guidance: Main flow as numbered actor steps? (run with --ai) [NOTE] 7 AS-REQ → coverage-review --uc UC-DLG-001 [NOTE] 7 AS-TEST → audit-test --uc UC-DLG-001 PASS 3 FAIL 0 MANUAL 0

Structural checks (TRC-*, QUS-001–006) run free with no AI key. Add --ai to invoke the AI guidance layer: Cockburn UC completeness (UC-G-001–006) and semantic story quality (QUS-G-001–007). Requirements quality in depth →

Why not a memory tool, a code-graph, or AI code review?

Memory tools remember your chats. Code-graph tools index your source. AI code reviewers analyse the diff. All are useful — none can flag what was never written. They have no way to surface a requirement that was never specified, a test that was never registered, or an architecture standard that was never addressed — because there is no diff for the thing that was never built. SysEdge is the layer that models what your system is supposed to do and whether it's actually verified.

Full comparison: SysEdge vs memory tools, code graphs, and AI code review →

Reproducible verification — Formbricks (external open-source TypeScript)

One defect. Four things the graph caught that code review didn't.

We cloned Formbricks from GitHub and ran SysEdge against it cold — no prior knowledge of the codebase. The defect: DEF-FBK-001 — survey response export includes PII when anonymisation is enabled. The token saving (71%) is real, but it's the smallest part of the story.

Token measurement (actual Anthropic API token counts, not estimates):
Without SysEdge: 14,512 input+output tokens (30 input + 14,482 output), plus 1,797,494 cache-read tokens to orient on the codebase.
With SysEdge: 4,141 input+output tokens (22 input + 4,119 output), plus 473,649 cache-read tokens — graph queries replaced source reading.
71% fewer input+output tokens — ~73% on total tokens including cache reads (1,812,006 → 477,790).

But the graph also surfaced three deeper problems that explain why the defect existed in the first place:

1 — The requirement for anonymisation was never written down

Running coverage-review --uc UC-FBK-005 (Export responses to CSV) against the 7 AS-REQ dimensions returned 3 FAILs. The most direct cause of the defect:

AS-REQ-003 FAIL — Exception Coverage

The UC describes the happy path only. No exception flow for anonymisation: what happens when the survey has anonymisation enabled is completely unspecified. An unspecified failure state is an omission fault — the system exhibits undefined behaviour when it occurs.

AS-REQ-006 FAIL — Scope Definition

The out-of-scope section is missing. The AI explicitly listed "(6) data masking, PII redaction, or anonymization" as absent from the specification — confirming the feature was shipped without a decision on whether privacy filtering was in scope.

Also flagged: AS-REQ-007 FAIL (0 linked tests — no traceability from spec to test) and 4 WARN findings on auth boundaries and testability.

Requirements traceability in depth →

2 — The export feature had zero tests at every V-model tier

test-gaps --instance surveys on the Formbricks graph:

══════════════════════════════════════════════════ TEST GAPS — surveys (2026-06-02) ══════════════════════════════════════════════════ ✗ COMPONENT GAPS (11 of 11 features uncovered) MOD-analysis Response Analysis (4 gaps) ✗ F-ANLYS-003 Export responses ✗ F-ANLYS-001 Response dashboard ✗ F-ANLYS-002 Individual response view ✗ F-ANLYS-004 Share survey results link ✗ INTEGRATION GAPS (11 of 11 features uncovered) MOD-analysis Response Analysis (4 gaps) ✗ F-ANLYS-003 Export responses ✗ USECASE GAPS (11 of 11 features uncovered) MOD-analysis Response Analysis (4 gaps) ✗ F-ANLYS-003 Export responses

No unit test. No API contract test. No Playwright UI flow. The export feature had shipped and been deployed with zero test coverage at any tier — invisible to coverage percentages because the feature was never registered in the test graph.

V-model test coverage in depth →

3 — The closest existing test had 7 of 7 AS-TEST dimensions failing

Pointing audit-test --uc UC-FBK-005 at the response.spec.ts Playwright file — the nearest test to the export feature — produced 7 FAILs across the AS-TEST-UC dimensions:

AS-TEST-UC-002 FAIL — Role Visibility

No test asserts that users without export permission see a denied state. The permission boundary is untested in both directions.

AS-TEST-UC-004 FAIL — Error Paths

No boundary-value test for the date filter. No test for export API returning 500. No test for anonymised-survey export — the exact path containing the defect.

Additional FAILs: no end-to-end happy path, no equivalence partitions, no semantic correctness assertions, no specification derivation comments in any test.

AI test quality audit in depth →

4 — The architecture standard for PII export had no addressing decision

The graph contained SEC-PII-001: "When a survey has anonymisation enabled, all response export endpoints must strip PII fields before returning data." This standard existed — but had no ADR (Architecture Decision Record) confirming how the codebase addresses it. A single query surfaces this:

$ python3 cli/sys_edge.py analyse --orphans --instance surveys Architecture standards with no addressing decision: SEC-PII-001 PII must be excluded from exports when anonymisation is enabled → no ADR SEC-PII-002 Export operations must be logged with actor identity and data scope → no ADR DATA-MIN-001 Data returned by API must be minimised to requesting actor's scope → no ADR

Three security standards. Zero decisions recorded. The defect was not a surprise — it was the predictable result of a specification gap, a test gap, and an architecture gap that all existed simultaneously and were all invisible without the graph.

Architecture standards alignment in depth →

SECOND VERIFICATION — DOCUMENSO (E-SIGNATURE PLATFORM)

We ran the same analysis on Documenso — an open-source DocuSign alternative. 15 minutes from cold clone to 5 findings. The Inngest job handlers that process envelope expiration have no tests. The expiration specification describes only the happy path. For a legally-binding e-signature platform, that's material.

Read the Documenso case study →

Where SysEdge fits

SysEdge is often compared to two adjacent kinds of tool: AI memory/RAG tools that remember your chat sessions, and code-graph tools that index your source for navigation. Both are useful — and different. They map what your code or your chats contain. SysEdge maps what your system is supposed to do and whether it's verified, enforcing requirements traceability, test coverage, and architecture standards as the source of truth.

SysEdge vs code-graph and memory tools →

Pricing

Free CLI. One-time bootstrap kit.

The CLI gives you the full graph from the terminal. The bootstrap kit adds the web visualiser, AI audit commands, Docker setup, architecture standards library, and session skill files — everything to be productive in an afternoon.

What you get	Free CLI	Bootstrap Kit · $149
CORE
Briefing, worklog, test-gaps — session start in 30 seconds	✓	✓
Requirements traceability: US → UC → Feature → Test chain	✓	✓
Enhancements, defects, design proposals — with parallel-session coordination	✓	✓
V-model 4-tier coverage (component / integration / UC / e2e)	✓	✓
Code scan: Go, TypeScript, Python, Java, C# — symbols + tests auto-linked	✓	✓
Backup, per-instance restore, audit staleness tracking	✓	✓
MIT + Commons Clause licence	✓	✓
BOOTSTRAP KIT ONLY
AI test quality audit — `audit-test` evaluates test files against 7 AS-TEST dimensions; `coverage-review` evaluates UC sets against 7 AS-REQ dimensions	—	✓
Architecture standards YAML — 53 standards across security, operations, development, infrastructure; ADR compliance tracking	—	✓
Web visualiser — drill-down graph, entity editor, coverage tiers, export panel	—	✓
Import skills — /import-stories, /import-use-cases, /import-requirements, /import-architecture	—	✓
Export commands — stories, use-cases, application-arch, technical-arch as structured Markdown	—	✓
/init-sysedge — auto-seed your project from directory scan	—	✓
Docker Compose + one-command setup	—	✓
All Claude Code skill files + session role templates	—	✓
12 months of updates + email support	—	✓

Get the free CLI on GitHub

Buy bootstrap kit — $149 →

Per repository · one-time · instant download · VAT included

AT 10 SESSIONS/DAY · SONNET 4.6 · ORIENTATION TOKENS ONLY

Without SysEdge

$31.73/month

With SysEdge

$1.25/month

Bootstrap kit pays back in 5 days at 10 sessions/day. Full measurement methodology →

Free CLI: MIT + Commons Clause — free for your own projects including commercial software.
Bootstrap kit: licensed per repository · 12 months updates · unlimited team members on that repo.
20+ parallel sessions or custom language support — contact us.
Full licence conditions → · Privacy policy →

Validated on Formbricks — a real open-source TypeScript project

The graph catches what the review missed.

SysEdge came out of running a 12-instance Claude Code system — twelve parallel sessions, one shared codebase. The findings from Formbricks and Documenso aren't unusual: unspecified exception paths, untested features, unaddressed standards, zero specification derivation in existing tests. They are the normal state of a codebase without a knowledge graph. With one, they become graph queries.

Get SysEdge → How it works vs alternatives

Verified on two production codebases, cloned cold from GitHub:
Documenso (e-signature) — 5 findings in 15 minutes → · Formbricks — 71% token reduction + 4 spec gaps →

71% fewer orientation tokens

reproducible · Formbricks · actual API tokens

Specification gaps before the code

AS-REQ-003, AS-REQ-006 caught the PII requirement omission

V-model coverage — four tiers

component · integration · UC flow · E2E — per feature

AI test quality audit

7 AS-TEST dimensions · PASS/WARN/FAIL per dimension

53 architecture standards

unaddressed standards surface as graph queries

Zero duplicate implementations

start-enhancement marks work in-progress across sessions