Yoni Lavi
Codeliance
March 2026
Note on process. This proposal was developed collaboratively with Claude (Anthropic), which served as primary drafter under the author’s direction. The architectural vision and synthesis are the author’s; the literature survey, formal framing, and prose were produced by the AI and verified against primary sources. This transparent accounting reflects the proposal’s own thesis: that the interesting artifact is the intent, not the implementation, and that honest attribution of AI contribution is preferable to ambiguity.
Abstract. AI coding agents have removed the thirty-year obstacle to graph-based code representations: the human preference for text. This proposal describes a development model in which the primary artifact is a signal graph (a functional reactive program with explicitly typed capability boundaries) that serves simultaneously as architecture model, security policy, and source of truth. Implementation code is a compiled artifact generated by AI agents within capability-restricted sandboxes. We synthesise functional reactive programming, object-capability security, and trust-annotated information flow into a single architectural substrate, and argue that this synthesis is newly practical due to the convergence of agent-authored code, lightweight sandboxing (WASM/WASI, BEAM), and capability hardware (CHERI). The proposal presents a three-phase research agenda, from core demonstrator through formal verification, and identifies the open problems that must be resolved to deliver on its claims.
For thirty years, a small community of researchers pursued an idea that working software developers consistently rejected: that source code should be a structured graph, not text in files. Projectional editors, structure editors, model-driven architecture: the history of these efforts is a history of elegant ideas that foundered on a single obstacle: programmers prefer text editors, value syntactic flexibility for half-formed thoughts, and resist representations that constrain expression before intent is clear.
That obstacle has moved.
The rapid adoption of AI coding agents (Claude Code [1], Codex [2], Cursor [3], Windsurf [4]) has introduced a new primary author of implementation code that does not require the specific affordances of text editing (syntactic flexibility, tolerance of temporarily invalid states) that motivated human resistance to structured representations. What agents benefit from is semantically rich, machine-readable representations of intent that prevent entire categories of error before generation begins. The graph representation that human developers rejected may be precisely what agent-authored software requires.
Moreover, early experience with unconstrained AI coding agents suggests recurring quality problems [5]. Agents can produce code that is locally correct but globally incoherent — architectural drift, duplicate abstractions, convention violations, unintended data flows — because no structural constraints exist to make these violations impossible. A graph-based representation actively improves agents by replacing implicit conventions with typed, machine-readable constraints that agents cannot violate. Recent work on LLM code generation within the Hazel typed-hole environment [6] provides direct empirical evidence: providing agents with rich static context from the type system substantially improves generation quality.
At the same time, a parallel development has made this timely: the emergence of spec-driven development frameworks (OpenSpec [7], GitHub’s Spec Kit [8], AWS’s Kiro [9]) that capture developer intent as structured artifacts before any code is written. These frameworks implicitly acknowledge that when AI agents write the code, the intent, not the generated implementation, is the artifact that matters, and that agents without structured constraints produce results that require disproportionate review effort.
This proposal argues that these two trends (agents as primary code authors, and structured intent capture as primary human activity) converge on a new software development paradigm. In this paradigm, a signal graph (a program written in a functional reactive style with explicitly typed capability boundaries) is simultaneously the architecture model, the security policy, and the program. Code is a compiled artifact derived from it. Security would be a structural property of the graph, not an aspiration enforced by review. The signal graph defines computation topology — what components exist, how data flows between them, and what authority each holds — not deployment topology (the domain of infrastructure-as-code tools such as Terraform or Pulumi).
We propose a development model with four interlocking properties.
The signal graph as source of truth. The primary artifact that humans author, review, and version-control is a functional reactive program: a directed graph in which every node is a pure function from time-varying typed inputs to time-varying typed outputs. This graph is simultaneously the architecture model, the security policy, and the program. It is the implementation at the level of abstraction humans author and review; the code inside each node is a generated artifact derived from it.
Capabilities as declared requirements. Every node in the signal graph is pure by default. Side effects (database access, network calls, LLM invocations, event emission) are not ambient authorities that code can reach for. They are typed capability handles declared separately from data inputs using a with clause: a node with signature (OrderRequest) → OrderConfirmation with DBHandle<'orders'>, EventEmitter<'order-events'> can read and write orders and emit order events. It would not be able to do anything else, because no other mechanism would exist within its scope.
Code as compiled artifact. AI agents generate the implementation of each node to satisfy the behavioural contracts encoded in the graph’s type signatures. The imperative code inside each node is an implementation detail, analogous to compiled bytecode. It can be regenerated, refactored, or rewritten without changing the system’s meaning, provided it satisfies its contracts. Humans review graph transformations; they do not routinely review generated code.
Security by construction. Because capabilities are injected and the type system propagates trust annotations, security properties would be structural invariants of the graph rather than aspirations enforced by code inspection. The class of vulnerabilities that depends on unintended data flow (injection attacks, prompt injection, privilege escalation through confused deputies) would be, in well-typed realisations, statically rejected by the type system rather than merely detected.
The signal graph model is not a novel metaphor. It is a direct application of functional reactive programming (FRP), a paradigm with a nearly thirty-year research history, here elevated from a UI programming technique to a whole-system architectural substrate.
FRP was introduced by Elliott and Hudak in their 1997 paper Functional Reactive Animation [10], which modelled interactive animations as pure functions over continuous time. The central abstractions are behaviours (values that vary continuously over time) and events (discrete occurrences at points in time). Programs are expressed as compositions of these abstractions, without explicit state mutation or callback registration. The semantics are denotational: a behaviour is literally a function Time → Value, giving the paradigm a clean mathematical foundation that supports equational reasoning unavailable in imperative or callback-driven styles.
Subsequent work refined the model. Wan and Hudak’s Functional Reactive Programming from First Principles [11] introduced arrowized FRP, which makes signal transformers (rather than signals themselves) the primary composable unit. This restriction prevents certain classes of space and time leaks and, more importantly for our purposes, makes the interface of each transformer explicit in its type. A signal transformer with signature SF a b transforms a stream of a values into a stream of b values. This is the formal object we extend with capability annotation.
The most prominent early realisation of FRP principles for browser UIs was Elm [12], which enforced purity strictly, made all effects explicit and managed by the runtime, and eliminated runtime exceptions in well-typed programs. Elm later moved away from explicit FRP, but the design properties it demonstrated remain instructive. In the JavaScript ecosystem, the FRP lineage continued through RxJS [13] (observable stream composition) and Cycle.js [14], which structures entire applications as pure functions from input streams to output streams with all side effects handled by external drivers. Cycle.js is particularly relevant: a component not given a DOM driver or an HTTP driver simply cannot perform those effects, a discipline close to our proposal’s capability injection. The gap is the absence of fine-grained capability typing and trust propagation, not the basic architectural shape.
A parallel tradition of graph-based reactive programming exists in the synchronous dataflow languages (Lustre [15], Esterel [16], and Signal [17]), developed primarily for safety-critical embedded systems. These languages compile reactive signal graphs to deterministic, formally verifiable code and have decades of deployment in avionics and nuclear instrumentation. At the commercial end of this tradition, LabVIEW [18] and Simulink [19] are the most widely deployed graph-based programming environments, validating the “graph as program” concept at scale while also illustrating the UX challenges (version control difficulty, visual clutter, loss of context at scale) that inform the graph-scale comprehension problem discussed in Technical Note A. The synchronous dataflow languages’ formal verification track record is directly relevant to Phase 3 of our research agenda, though their focus on synchronous, clock-driven execution differs from the asynchronous, event-driven model our proposal targets.
More recently, the FRP research community has explored differential computation, evaluating only the graph nodes affected by a change. Differential dataflow [20], as implemented in Materialize [21] and the DBSP framework [22], demonstrates that this is practical at database scale. Our proposal’s claim that the signal graph can serve as a production substrate, beyond a development-time abstraction, depends on this line of work.
A concrete illustration. To motivate the extensions developed in §frp-architecture, consider a node that processes user-submitted text and passes it to an LLM with tool-calling access. In a conventional system, this is dangerous: if the submitted text contains adversarial instructions, the LLM may execute them using its tools. The vulnerability is not a bug in any individual component; it is an architectural property: the unintended flow from untrusted input to a privileged executor.
In the signal graph, the same scenario is expressed as two nodes. The first, UserInputHandler, has signature:
UserInputHandler : (HTTPRequest<'POST', 'user:message'>) → Untrusted<UserMessage>
with DBHandle<'sessions'>
The second, LLMOrchestrator, has signature:
LLMOrchestrator : (SanitisedPrompt) → AgentResponse
with LLMClient<tools>
A direct wiring from UserInputHandler‘s output to LLMOrchestrator‘s input is a type error: Untrusted<UserMessage> does not match SanitisedPrompt. The graph cannot be assembled without an explicit node that transforms Untrusted<UserMessage> into SanitisedPrompt — a node whose existence is visible in the architecture, whose implementation is subject to contract verification, and whose presence is required by the type system rather than by a policy document. In a well-typed realisation of this model, the prompt injection vulnerability would be ill-typed: no well-typed graph could express it. The type system design that delivers this guarantee is the central obligation of Phase 1 (§phase1); the example illustrates the target property, not a proven result.
The step from FRP as a UI technique to FRP as a whole-system architectural model requires two extensions that the existing literature does not fully address.
The first is capability annotation. Standard FRP treats effects as values managed by the runtime, but does not give them a fine-grained type structure that distinguishes, say, a read-only database handle from a read-write one, or a sanitised string from an untrusted one. The object-capability model [23] provides the missing ingredient: capabilities are unforgeable typed references whose possession is the proof of authorisation. Combining FRP’s signal graph semantics with the object-capability model’s typed authority gives us signal graphs in which data-flow and capability-flow are both first-class, typed, and statically checkable.
The second is trust tainting. Data entering the graph from untrusted sources (user input, third-party API responses, LLM outputs) carries a type marker that propagates through signal transformations until it passes through an explicitly designated sanitisation node. This is analogous to taint tracking as studied in information-flow security [24], and specifically to the labelled-IO approach demonstrated by practical information-flow control libraries such as LIO [25], but expressed as ordinary type-level propagation rather than a separate analysis. A node that accepts Untrusted<string> and a node that accepts LLMClient<tools> cannot be directly wired; the type system would prevent the combination. The graph topology would enforce the security property without separate analysis.
In a conventional program, time is implicit: state changes in place, and its history is lost unless explicitly logged. In the proposed signal graph model, under the purity guarantees the runtime would enforce, time would be structural: every signal would carry a history, and the system’s behaviour at any point would be a pure function of its input signals up to that point. This has immediate practical consequences.
Proposing a change to the system would mean forking the signal graph’s timeline. An agent would explore the fork, observing projected effects on downstream signals. If the exploration is satisfactory, the fork is merged into the main timeline; if not, it is discarded with no cleanup cost, because the fork is a value, not a mutation. The human review step would not be a diff of two static models but a behavioural comparison of two timelines, including the agent’s exploratory history and the projected downstream effects on dependent signals.
In production, this structural temporality would double as observability infrastructure. Every crossing of a capability boundary (every database read, network call, or LLM invocation) would be a typed, observable event. A structured log of these events would constitute a record of the system’s inputs (subject to the fidelity limitations discussed in Technical Note A). Given that log and a deterministic signal graph, the system’s behaviour at any past point would be substantially reproducible. Debugging a production failure would mean replaying the event log in the development environment, reconstructing the timeline, and forking the failure point. The replay of the production event log would provide a substantial basis for regression testing; authoring it would not be a separate step.
Each element of this proposal has been explored independently; the convergence that makes their synthesis newly practical is discussed in §why-now. To our knowledge, no existing system simultaneously provides graph-level capability analysis, trust-propagated type checking across component boundaries, and AI-generated implementations executing within capability-restricted sandboxes; the security-by-construction property emerges from their combination, not from any element alone.
The Object Management Group’s Model-Driven Architecture (MDA) programme [26], launched in 2001, pursued a superficially similar vision: models as the primary artifact, with code generated from them. MDA foundered on the round-trip engineering problem: models and generated code diverged as soon as developers edited the code directly, and keeping the two synchronised became more expensive than maintaining the code alone. Our proposal avoids this failure mode structurally. The signal graph is not a model of the code; it is the program, at the level of abstraction humans review. Implementation code inside nodes is a generated artifact that can be regenerated without invalidating the graph, because the graph’s type signatures and contracts (not the implementation) are the stable interface. There is no round-trip problem because there is no return trip: humans do not edit generated node implementations.
The C4 model [27] provides a hierarchical approach to software architecture visualisation across four levels of abstraction: Context, Containers, Components, and Code. It has become the most widely adopted lightweight architecture diagramming approach, supported by tools including Structurizr, LikeC4, IcePanel, and Mermaid.
Two recent developments extend C4 toward the role we envision. LikeC4 [28] provides an MCP server that exposes the architecture model to AI agents as a queryable knowledge base, transforming static diagrams into an interactive substrate that agents can interrogate. Some practitioners, including the author [29], have begun using C4 models as “executable context” for agents, maintaining the architecture model in the repository as the source of truth that constrains agent behaviour.
C4’s limitation for our purposes is that it is a communication model, not a constraint model. It describes architecture but does not enforce it. Our proposal replaces the C4 model with a typed signal graph that both describes and enforces; the diagram and the program are the same artifact.
Haskell demonstrated that purity-by-default with explicit effects is practical for real software [30]. The IO monad makes side effects visible in function signatures: a function of type a → b is guaranteed pure, while a → IO b declares that it performs effects. More recent work (algebraic effect systems as in Koka [31] and Frank [32], and dependent type systems as in Idris 2 [33]) makes this more expressive: effects can be parameterised, composed, and reasoned about as first-class values.
Roc [34] takes a different but relevant approach: it is a pure functional language in which all side effects are provided by an interchangeable “platform” that the application cannot bypass. A Roc application that targets a web server platform can handle HTTP requests but cannot access the filesystem; the platform boundary is the capability boundary. This is the closest existing language-level analogue to our proposal’s per-node capability injection, though Roc operates at the whole-application level rather than per-component.
Our proposal applies this insight at the architectural component level rather than the language type level. The granularity is coarser (components rather than functions), and the enforcement mechanism is the runtime rather than the compiler. But the principle is identical: effects are declared in signatures, not acquired from ambient context. The formal verification obligation described in §agenda is, in part, the obligation to show that this coarser enforcement is sufficient for the security properties we claim.
Unison [35] takes the position that text files are the wrong storage substrate for code from the outset. Definitions are identified by a hash of their abstract syntax tree rather than by name; the codebase is an append-only database of typed ASTs rather than a directory of files. The consequences are practically significant: the codebase is always in a type-checked state, incremental compilation is perfect (the same definition is never compiled twice), and semantically-aware version control eliminates entire classes of merge conflict. Unison also provides an algebraic effect system (“abilities”) in which functions declare required effects in their type signatures, enforced by the type system, such that a program can only perform effects for which it has been explicitly given an ability.
A related experiment is Darklang [36], which attempted a holistic programming environment where code, editor, and infrastructure were unified: deployable directly from the structured editor with no separate build or deploy step. Darklang demonstrated genuine developer productivity gains for simple backend services but struggled with the adoption costs of a fully proprietary environment and the difficulty of scaling a bespoke infrastructure layer. Its trajectory is instructive: the vision of a unified, structured development substrate is compelling, but adoption requires either an incremental migration path from existing tools or such overwhelming productivity advantages that developers accept the switching cost. Our proposal learns from this by treating the signal graph as a layer above existing runtimes (WASM, BEAM) rather than a replacement for them.
In a related vein, the Nix package manager [37] and its sibling Guix demonstrate that content-addressed, capability-restricted computation is viable at scale in a different domain: software builds. Nix derivations are pure functions from declared inputs to outputs, with no ambient access to the network or filesystem during builds. The Nix store is a content-addressed database of build artifacts. The result is hermetic, reproducible builds across millions of packages. While Nix operates at the build level rather than the runtime level, it provides a large-scale existence proof that the capability-restricted, content-addressed computational model our proposal envisions is practical.
Unison demonstrates that content-addressed, database-backed code storage is production-viable; version 1.0 was released in November 2025. Our proposal’s treatment of node implementations as compiled artifacts derived from the signal graph, rather than source files, is architecturally consistent with Unison’s model at the implementation layer. The signal graph itself, however, is not a concept Unison provides: Unison’s composition model operates at the function and library level, not at the level of explicitly wired capability topology.
Unison’s ability system tracks what category of effect a function performs but does not enforce the fine-grained authority boundaries the signal graph requires: the distinction between a handle scoped to a specific database versus ambient database access, or the propagation of Untrusted<T> trust labels. The two approaches are complementary: Unison addresses code storage and effect declaration; our proposal addresses architectural wiring and trust propagation.
Hazel [38], [39] is a live functional programming environment built around the principle that every editor state should be statically and dynamically meaningful, even when the program is incomplete. It achieves this through typed holes: missing or type-inconsistent expressions are wrapped in holes that carry type information and, in the dynamic semantics, propagate as opaque values through evaluation. The result is that feedback (type errors, live outputs, hole closure information) is available continuously during editing rather than only when a program is complete.
Hazel’s relevance to the present proposal is twofold. First, it provides semantic foundations for the development workflow described in §workflow: an agent proposing a graph transformation will, during the proposal phase, produce a partially-complete graph containing unfilled node signatures. Hazel’s hole calculus demonstrates that such partial states can be given well-defined types and evaluated meaningfully, supporting the “project downstream effects” step of the workflow without requiring the entire graph to be complete before any inference is possible. Second, a 2024 paper from the Hazel group [6] integrates LLM code generation directly into the typed-hole environment, finding that providing the LLM with static context from the hole’s type and typing environment substantially improves generation quality. This is a direct empirical precedent for the claim in §workflow that agents generating node implementations benefit from the semantically rich context that the signal graph’s type signatures provide.
The BEAM virtual machine [40], underlying Erlang and Elixir, provides the closest existing model to the runtime we envision. BEAM processes are lightweight, fully isolated (no shared memory), and communicate exclusively by message passing. A process cannot reach into another process’s state or access global mutable resources. The “let it crash” philosophy, where individual processes fail and are restarted without system-wide impact, is a direct consequence of isolation.
More broadly, the signal graph’s model of isolated components communicating by typed messages is an instance of the actor model [41], and the resemblance to BEAM processes is not coincidental. However, the actor model as typically realised (Erlang, Akka, Orleans) leaves two things implicit: the set of messages an actor may receive is visible in its interface, but the set of external resources it may access is not, and the wiring between actors is determined at runtime by message sends rather than declared statically. Pony [42] goes further, integrating reference capabilities (iso, val, ref, box, trn, tag) directly into the actor-model type system to enforce data-race freedom at compile time. This is the closest existing integration of actor isolation and capability-based type checking. However, Pony’s capabilities govern memory access patterns (aliasing, mutability), not external authority (database access, network calls, LLM invocation); the signal graph’s capability model operates at the architectural level rather than the memory-reference level.
Our proposal extends the actor/BEAM insight in two ways. First, we make the message-passing interfaces typed and capability-aware: a component’s signature declares not just what data it accepts but what capability objects it requires. Second, we make the wiring of components explicit in the signal graph rather than implicit in application code. The graph serves as a declarative analogue of the supervision tree, expressing component wiring at a level humans can reason about, though failure handling and restart policies require additional specification.
The object-capability model [23], [43] holds that access to a resource requires possession of an unforgeable reference to that resource. Rather than checking permissions against an access control list, a capability system makes the capability itself the proof of authorisation. This model has been implemented at every level of the computing stack: in programming languages (the E language [23], Google’s Caja [44]), in operating systems (Capsicum [45], seL4 [46]), and in runtime environments (Deno [47], WebAssembly/WASI).
The most directly relevant prior work for our purposes is the treatment of capability-passing in distributed systems, where the additional concern is that network reachability can itself constitute ambient authority. Miller’s E language [23] addressed this by mediating all inter-object communication through explicit references passed through the object graph; no ambient network or global namespace is available. The Agoric platform [48] extends this model to JavaScript through Hardened JavaScript (SES, Secure EcmaScript), demonstrating that object-capability (ocap) discipline is achievable in a mainstream language without requiring a new runtime. Stiegler’s An Introduction to E and the Distributed Object-Capability Model [49] provides an accessible treatment of the distributed case. Our signal graph’s explicit edge wiring is the architectural-level analogue of E’s reference passing: a node that is not wired to an external network capability handle has no mechanism for external communication, regardless of the network services that exist at the operating system level.
More recently, AWS’s Cedar [50] provides a formally verified authorisation policy language with a capability-aware structure, demonstrating that fine-grained, analysable authority models are viable in production cloud infrastructure. Cedar’s approach (expressing policies as analysable programs rather than opaque access-control list (ACL) tables) is philosophically aligned with our proposal’s treatment of capability wiring as a typed, reviewable artifact.
The combination of capability-based security with FRP’s signal graph model is, to our knowledge, novel as a whole-system architectural substrate. Existing capability systems enforce authority restrictions at runtime; existing FRP systems enforce dataflow discipline at the type level. The proposed synthesis would enforce both simultaneously, making the two disciplines mutually reinforcing rather than independently applied.
Capability Hardware Enhanced RISC Instructions (CHERI) [51] implements capability-based memory protection directly in hardware. On a CHERI processor, every pointer is a capability: a hardware-protected value carrying an address, bounds, permissions, and a tag bit checked on every memory operation. Capability forgery (via buffer overflow, type confusion, or integer-to-pointer cast) causes a hardware trap. No software guard is needed.
CHERI is reaching commercial maturity. Arm’s Morello chip [52] demonstrated CHERI on AArch64. Microsoft’s CHERIoT [53] adapted CHERI to RISC-V for embedded devices. Codasip [54] has released commercial CHERI RISC-V processor IP. SCI Semiconductor released the ICENI family [55], among the first commercially available CHERI-enabled embedded chips, based on CHERIoT-Ibex on RISC-V. The CHERI Alliance [56], with Google as a founding member, was established in 2024 to coordinate adoption.
Three CHERI properties matter for our proposal. First, unforgeable capabilities: a component that does not possess a capability to a memory region cannot acquire one through any means the processor permits. Second, fine-grained compartmentalisation: capabilities can be scoped to individual allocations, enabling component isolation within a single process at hardware speed. Third, near-zero porting cost: a 2021 study ported six million lines of C and C++ (KDE, Qt, X11) to CHERI with changes to 0.026% of source lines [57]. For AI-generated code targeting a CHERI-aware runtime from scratch, the porting cost is zero (assuming the runtime itself has been ported, a one-time engineering effort).
The practical feasibility of per-component isolation has improved dramatically. WebAssembly (WASM) and its system interface (WASI) provide capability-based isolation with near-native performance across languages [58]. The WASM Component Model [59] extends this with typed inter-component interfaces: components declare their imports and exports as typed functions, and the runtime links them with type-checked bindings. This is directly relevant to the signal graph’s inter-node type system. WASI’s I/O model is explicitly capability-based: a WASM module receives handles to the resources it may access at instantiation, with no ambient access to the host environment. Pydantic’s Monty [60], a minimal Python interpreter written in Rust, achieves microsecond-scale startup with complete host isolation by default. BEAM processes start in single-digit microseconds. CHERIoT demonstrates hardware-enforced compartmentalisation with negligible overhead on resource-constrained devices.
CHERI hardware provides a backstop for software sandboxes: even if a sandbox implementation contains a memory safety bug, the hardware prevents capability forgery at the memory level. The result is two complementary enforcement layers (software sandbox and hardware capability), providing overlapping coverage with distinct failure modes.
The snapshotting, replay, and time-travel debugging properties described in §time have significant prior art in the durable execution paradigm. Temporal [61], Restate [62], and Azure Durable Functions [63] provide production-grade infrastructure for persisting workflow state, replaying execution from event logs, and resuming after failure. These systems demonstrate that deterministic replay from logged events is practical at scale.
A persistent challenge for durable execution frameworks is nondeterministic interleaving: replay must reproduce the same ordering of concurrent operations that occurred in the original execution, or the replayed state diverges. Temporal addresses this by requiring orchestrator code to be deterministic; Restate uses a journal that records the outcome of each operation. Both approaches work but impose constraints on application code. The signal graph’s pure, deterministic propagation semantics would avoid this class of problem at the inter-node level: signal propagation order is determined by graph topology, not by runtime scheduling. Replay fidelity for nodes with internal concurrency or timing dependencies remains an open question (see Technical Note A).
Our proposal’s replay model also differs in granularity and scope: durable execution frameworks replay at the workflow step level, while the signal graph would replay at capability boundary crossings, which would provide a finer-grained and more complete record of system inputs. The signal graph also integrates replay with the type system’s trust and capability annotations, enabling the replay infrastructure to enforce the same security properties as the live system. These production frameworks validate that event sourcing and deterministic replay are well-understood engineering.
The SDD movement, represented by OpenSpec [7], GitHub’s Spec Kit [8], and AWS’s Kiro [9], addresses the problem that AI coding agents are unpredictable when requirements live only in chat history. These frameworks create structured, versioned specification artifacts that persist in the repository and provide agents a stable context.
Codespeak [64], created by Kotlin designer Andrey Breslav, takes this further: developers maintain plain-English specifications that compile via LLM to Python, Go, or TypeScript, treating implementation code as a generated artifact. The framing — “maintain specs, not code” — is close to our proposal’s treatment of node implementations as compiled artifacts derived from the signal graph. However, Codespeak’s specifications are untyped prose without capability annotations or trust propagation, so the security-by-construction properties the signal graph provides are outside its scope.
Current SDD frameworks, including Codespeak, treat their spec artifacts and the architecture model as separate concerns. Their outputs are prose documents or natural-language specifications with limited formal structure. Our proposal argues that as these frameworks mature, their output should converge with the signal graph: a proposed change is a transformation of the typed graph, not a separate markdown document. The distinction between “spec” and “architecture” dissolves when the graph is both.
A parallel development at a different level of technical sophistication supports this trajectory. Visual workflow automation platforms (Zapier [65], Make.com [66], and n8n [67]) have achieved mass adoption by letting non-developers build systems as directed graphs of triggers, actions, and conditional branches. n8n’s AI Workflow Builder already implements the core interaction loop our proposal envisions at a higher level of abstraction: a user describes intent in natural language, the AI generates a graph (represented as JSON), and the user reviews and refines the result visually. These platforms validate the appetite for graph-based system construction and the viability of AI-generated graph definitions. Their limitation is the absence of the properties this proposal requires: typed interfaces, capability restrictions, trust propagation, and formal security guarantees. As teams scale from internal automations to customer-facing AI agents, the gap between workflow automation and production-grade architectural rigour becomes acute. That trajectory converges on the kind of typed, capability-aware graph substrate we propose.
The primary artifact is a version-controlled, typed signal graph with the following structure.
Nodes are pure functions with explicit signatures. A node’s signature has two parts: its data inputs (typed signals from upstream nodes) and its capability requirements (typed handles to external resources, declared with a with clause). The data inputs describe what the node transforms; the capability requirements describe what authority it has. This separation reflects a lifecycle distinction: data signals flow at runtime as the graph propagates, while capabilities are provisioned when the graph is instantiated. A node with no with clause is guaranteed pure.
A note on the design choice. An alternative design treats capabilities as ordinary typed parameters alongside data inputs, consistent with the object-capability model’s principle that capabilities are just values. We separate them syntactically because the distinction between “what a node transforms” and “what authority it holds” serves different review concerns — architecture and security — and because the lifecycle difference is real: capabilities are bound at construction, data flows at invocation.
Edges are typed data connections between nodes. An edge from node A’s output to node B’s input is valid only if the types match. Edges carry data; capabilities are not wired through edges but provisioned via with clauses. The with clauses collectively constitute the architecture’s security policy, expressed as reviewable graph structure rather than prose. Because the graph’s parameter list declares all external dependencies, swapping a production capability for a mock (replacing a live DBHandle with a test fixture, for example) requires only a change at the graph boundary — no node signature changes, since the with clause names a type, not a specific instance.
Trust annotations (the type-level markers introduced as trust tainting in §frp-architecture) propagate through the graph. Data entering from untrusted sources carries a type marker, Untrusted<T>, that is preserved through transformations until explicitly discharged. In a well-typed realisation, the type system would prevent Untrusted<T> from reaching a node that accepts only T. Discharge is most effective when it is not merely a label removal but a type transformation: converting unstructured input into a constrained representation whose structure limits what downstream nodes can receive. The combination of trust propagation and structural typing is what delivers the security properties claimed in §security.
An important open design question must be acknowledged here. The trust annotation scheme as described enforces the local typing of individual nodes, but the full security guarantee requires that the wiring also be checked; specifically, that a source classified as untrusted at the graph’s edge cannot be connected to a node whose signature expects a clean T, bypassing the Untrusted<T> marker through a widening coercion. This is the standard coercion problem in information-flow type systems [24]: local type correctness of nodes is necessary but not sufficient for noninterference; the type system must also enforce that the subtyping relation between Untrusted<T> and T is absent, or equivalently, that wiring compatibility checks are flow-sensitive with respect to trust levels.
Several solutions exist in the literature (most directly, treating trust levels as security labels in the style of Jif [68] or imposing a lattice structure on trust types with no upward coercion), but the precise design for our graph wiring context is an open question that Phase 1 language design work must resolve. The proposal does not claim this problem is solved; it claims it is tractable and that the right place to solve it is in the type system, where the literature provides well-understood tools.
Behavioural contracts are attached to node signatures as pre- and postconditions. These are the specifications against which AI-generated implementations are verified, and the stable interface across which different implementations are interchangeable.
The following pseudocode sketches an AI customer support agent as a signal graph. This scenario was chosen because it is a domain where the security properties of the signal graph model are most immediately visible: untrusted user input, LLM invocations with and without tool access, and fine-grained capability distinctions are all present. Unlike the simplified two-node illustration in §frp-brief, this example shows a realistic pipeline with structured input parsing and content moderation. No concrete syntax has been designed; this is illustrative of the kind of artifact a developer would author and review.
graph CustomerSupport(
HTTPRequest<'POST', 'customer:message'>,
DBHandle<'knowledge-base', read>,
LLMClient<inference>,
LLMClient<[lookup]>,
ResponseChannel<user-session>,
EventEmitter<'support-queue'>
) {
node ReceiveMessage :
(HTTPRequest<'POST', 'customer:message'>)
→ Untrusted<RawMessage>
node ParseMessage :
(Untrusted<RawMessage>)
→ CustomerQuery
with LLMClient<inference>
node ModerateContent :
(CustomerQuery)
→ CustomerQuery | PolicyViolation | EscalationRequest
with LLMClient<inference>
node FetchContext :
(CustomerQuery)
→ ConversationContext
with DBHandle<'knowledge-base', read>
node GenerateResponse :
(ConversationContext)
→ AgentResponse | LLMError
with LLMClient<[lookup]>, DBHandle<'knowledge-base', read>
node SendReply :
(AgentResponse)
→ DeliveryConfirmation
with ResponseChannel<user-session>
node HandleLLMError :
(LLMError)
→ DeliveryConfirmation
with ResponseChannel<user-session>
node NotifyUser :
(PolicyViolation)
→ DeliveryConfirmation
with ResponseChannel<user-session>
node EscalateToHuman :
(EscalationRequest)
→ EscalationTicket
with EventEmitter<'support-queue'>
// Data flow
edge ReceiveMessage → ParseMessage
edge ParseMessage → ModerateContent
edge ModerateContent.ok → FetchContext
edge ModerateContent.violation → NotifyUser
edge ModerateContent.escalation → EscalateToHuman
edge FetchContext → GenerateResponse
edge GenerateResponse.ok → SendReply
edge GenerateResponse.error → HandleLLMError
}
The graph’s parameter list declares its complete external dependencies: an HTTP route, a database handle, two LLM clients with different permission levels, a response channel, and an event emitter. This list is the system’s authority manifest. In production, these parameters are bound to real infrastructure; in testing, they are replaced with mocks or deterministic fixtures — no node signature changes, only the bindings at the graph boundary. Because the graph has a typed signature (its parameter list and output types), it can itself be used as a node in a larger graph. Hierarchical composition falls out naturally from the model. To make this concrete, the following graph sketches a platform that composes CustomerSupport (the graph above) alongside two other services:
graph SupportPlatform(
HTTPRoute<'platform:*'>,
DBHandle<'knowledge-base', read>,
DBHandle<'billing', read-write>,
DBHandle<'audit', append>,
LLMClient<inference>,
LLMClient<[lookup]>,
ResponseChannel<user-session>,
ResponseChannel<agent-session>,
EventEmitter<'support-queue'>
) {
node RouteRequest :
(HTTPRoute<'platform:*'>)
→ CustomerRequest | AgentRequest | BillingRequest
node CustomerSupport :
(CustomerRequest)
→ ServiceOutcome
with DBHandle<'knowledge-base', read>, LLMClient<inference>, LLMClient<[lookup]>, ResponseChannel<user-session>, EventEmitter<'support-queue'>
node AgentDashboard :
(AgentRequest)
→ ServiceOutcome
with DBHandle<'knowledge-base', read>, ResponseChannel<agent-session>
node BillingService :
(BillingRequest)
→ ServiceOutcome
with DBHandle<'billing', read-write>, ResponseChannel<user-session>
node RecordAudit :
(ServiceOutcome)
→ AuditConfirmation
with DBHandle<'audit', append>
// Data flow
edge RouteRequest.customer → CustomerSupport
edge RouteRequest.agent → AgentDashboard
edge RouteRequest.billing → BillingService
edge CustomerSupport → RecordAudit
edge AgentDashboard → RecordAudit
edge BillingService → RecordAudit
}
CustomerSupport is no longer nine nodes visible at this level; it is a single node with a typed signature. Its internal wiring — the trust zones, the graduated LLM access, the moderation routing — is encapsulated. At the platform level, the reviewer sees only what authority each service holds and how data flows between them. The platform’s parameter list is the union of its sub-graphs’ requirements: the DBHandle, LLMClient, and channel capabilities each appear exactly where they are needed. BillingService has read-write access to the billing database but no LLM access; AgentDashboard can read the knowledge base but cannot write to billing. These constraints are visible at a glance in the capability annotations.
The SupportPlatform composition graph. Each service node is a sub-graph (the nine-node CustomerSupport graph is now a single node). Capability annotations show the authority distribution at the platform level. The audit node collects outcomes from all services with append-only database access.
The CustomerQuery type is central to the security argument. ParseMessage does not merely strip the Untrusted wrapper from free text. It transforms unstructured input into a constrained representation — a classified intent (from a finite set of categories), extracted entity references, and bounded text fields — whose type limits what downstream nodes can receive. The raw message is consumed; downstream nodes never see it. This is a stronger guarantee than trust annotation alone: a well-typed CustomerQuery structurally cannot carry arbitrary executable instructions because its type does not permit unbounded free text in positions that flow to privileged nodes. A limitation must be acknowledged: if CustomerQuery contains any free-text field, that field could still carry adversarial content. The defence is layered rather than absolute, but each layer is visible in the graph topology and enforceable by the type system.
The CustomerSupport signal graph. Red shading marks the untrusted zone; green shading marks the structured-query region. Edges show data flow; capability requirements are annotated on each node. LLM access (blue) is graduated: inference-only for parsing and moderation, a single lookup tool for response generation.
This diagram is simultaneously the architecture model, the security policy, and the program. Several properties are visible at a glance, without reading any implementation code.
Prompt injection is addressed through structural typing and topological constraint. ParseMessage transforms raw input into a CustomerQuery, a constrained representation that discards the original free text. Both LLMs that process user input (ParseMessage and ModerateContent) have LLMClient<inference>: even if adversarial instructions influence their behaviour, they have no mechanism to act on them. The tool-capable LLM (GenerateResponse) receives only ConversationContext assembled from a moderated CustomerQuery and knowledge-base lookups — never raw user text. A direct path from untrusted input to a tool-capable LLM does not exist in the graph; in a sound realisation of the type system, it would be ill-typed. A subtlety must be acknowledged: capability restriction prevents the LLM from acting on adversarial instructions, but not from being influenced in its classification. An adversarially crafted message could cause ParseMessage to produce a CustomerQuery that misclassifies intent, routing the query to the wrong downstream path. The defence here is twofold: the attack surface is narrowed from arbitrary tool execution to incorrect routing within a typed pipeline (a qualitative reduction in severity), and the routing itself operates within the user’s authorised scope — if downstream actions are bounded by the user’s own credential-scoped capabilities (passed as a parameter at the graph boundary), a misrouted query can only trigger actions the user was authorised to perform. Separating user-level authorisation from node-level capability injection is a design consideration for Phase 1.
Capability distribution is minimal and visible. The with clauses are a complete manifest of the system’s authority. ReceiveMessage is pure: it transforms a typed HTTP request into a domain message with no with clause. ParseMessage and ModerateContent have inference-only LLM access: enough to classify and evaluate text, not enough to act on instructions. FetchContext has read-only knowledge-base access. GenerateResponse has a scoped LLM client with a single lookup tool and read-only database access. Terminal nodes (SendReply, NotifyUser, HandleLLMError) each have only a session-scoped response channel. EscalateToHuman can emit to the support queue but cannot read or write any database. No node has more authority than its function requires.
Conditional routing and error handling are explicit. ModerateContent produces a three-way union: approved queries continue to the response pipeline, policy violations are routed to user notification, and ambiguous cases are escalated to human agents. GenerateResponse returns AgentResponse | LLMError, with the error case routed to HandleLLMError. In both cases, the routing is a structural property of the graph, visible in the diagram and the pseudocode.
The precise syntax for conditional routing, fan-out, and error propagation is an open design question for the Phase 1 language design; see Technical Note A.
Intent capture. A human describes a desired change in natural language. An SDD-style tool translates this into a proposed graph transformation: new nodes, modified with clauses, changed signatures or contracts.
Graph review. Humans review the diff as a visual graph change: new nodes highlighted, new capability requirements marked, trust boundary crossings flagged. This review is simultaneously an architecture review, a security review, and a design review. The reviewer is approving a typed program transformation, not reading prose.
Implementation. AI agents generate code for each new or modified node, targeting the capability-restricted runtime. Each node is implemented in isolation: the agent receives the node’s signature, its contracts, and the types of its inputs and outputs. It has no visibility into adjacent nodes’ implementations.
Verification. Automated tooling confirms that implementations satisfy their contracts, that the assembled graph conforms to the declared types, and that no node exceeds its injected capabilities. Since the runtime enforces capability restrictions, verification is primarily structural (type-checking and contract satisfaction) rather than arbitrary dataflow analysis.
Merge. If verification passes, the graph transformation is merged. The human approved the graph diff; the machine confirmed conformance. No human code review of generated implementations is required.
Nodes are individually testable by injecting mock capability objects and asserting output signals against input sequences. Graph-level integration tests are expressed the same way, at the graph boundary. The replay mechanism described in §time doubles as regression testing infrastructure: a recorded production event log is a ready-made test suite.
Each node executes in a lightweight, capability-restricted sandbox (a WASM module, a Monty-style interpreter, or a BEAM-like process), with CHERI hardware where available. Critical properties:
No ambient authority. As described in §signal-graph, a node would not be able to import libraries, access the filesystem, make network calls, or perform any side effect beyond calling methods on its injected capability objects, enforced by the absence of any mechanism rather than a policy guard.
Defence in depth. The type system would prevent the graph from expressing forbidden capability grants. The runtime sandbox would prevent generated code from exceeding its injected capabilities. The operating system compartmentalises processes with OS-level enforcement. On CHERI hardware, the processor prevents capability forgery at the memory level. These layers are not fully independent, because the runtime’s capability injection is configured by the type system’s analysis, so a type system bug could misconfigure the runtime. But they provide overlapping coverage with distinct failure modes: a sandbox escape does not help an attacker who lacks a hardware capability, and a type system error does not propagate past a correctly configured OS compartment. This is weaker than fully independent enforcement but substantially better than any single layer.
Language agnosticism. WASM is the natural compilation target, supporting Rust, C, C++, Go, and Python (via interpreters such as Monty). The signal graph defines component interfaces using a language-neutral type system; the implementation language is an optimisation choice made by the AI agent, or specified by performance constraints in the node’s contract.
Snapshotting and resumption. Nodes can be paused, serialised, and resumed, enabling durable execution, time-travel debugging, and the production replay loop described in §time.
The performance overhead of per-node sandboxed execution is not yet established for this model. For reference, WASM module instantiation, BEAM process spawning, and Monty interpreter startup all operate at the microsecond scale or below, and all achieve efficient execution once running. A typical microservice-to-microservice call over the network costs on the order of 1ms, so per-node overhead at the microsecond scale is unlikely to dominate in architectures where nodes correspond to coarse-grained components (individual services or bounded contexts). However, the cost of capability-mediated I/O, serialisation at node boundaries, and signal propagation through graphs with hundreds of nodes has not been measured in this context, and overhead may be prohibitive if the graph is decomposed to the granularity of individual functions. Determining the right granularity (coarse enough for acceptable overhead, fine enough for meaningful capability isolation) is an empirical question that the Phase 1 demonstrator must answer. Serialisation cost at node boundaries is a further concern; for complex types, shared-memory approaches (CHERI compartments within a single address space or zero-copy WASM memory) may be necessary to keep per-crossing overhead acceptable.
The capability-injection model would provide security guarantees qualitatively different from those achievable by code review or runtime monitoring.
Injection attacks. SQL injection and command injection depend on untrusted input reaching an interpreter in executable form. In the signal graph, a SQL-executing capability would accept typed queries, not raw strings. Untrusted<string> could not reach it without passing through a sanitisation node that produces a typed query. In a sound realisation of the type system, the pattern would be ill-typed — rejected by the type system rather than left to convention.
Prompt injection. Ill-typed for the structural reasons described in §frp-brief and illustrated in §concrete-graph: no well-typed wiring would connect an untrusted source to an LLM-capable node without explicit sanitisation.
Supply chain attacks. A third-party library used within a pure node would have no capability objects. Even if it contains malicious code, it would have no mechanism for I/O in the proposed runtime model. On CHERI hardware, even a library that attempts to exploit a memory safety vulnerability to escape its sandbox cannot forge a capability to memory it was not granted. If a library update introduces a new capability requirement, this would appear in the graph diff as a new with clause, a visible, reviewable change.
Privilege escalation. A node would not be able to acquire capabilities it was not given. The graph would be the complete and sole description of the system’s capability distribution. On CHERI hardware, this guarantee extends to the memory level.
The research agenda is organised in three phases of increasing scope, reflecting a realistic dependency ordering. Phase 1 produces a working demonstrator; Phase 2 hardens it for meaningful deployment; Phase 3 addresses the deeper formal and hardware integration questions. The convergence of developments that makes this agenda timely — and that explains why this synthesis has not been attempted before — is discussed in §why-now. We present the agenda first so that the reader encounters the “why now” argument with a concrete understanding of what the proposal actually requires.
Signal graph language and type system. Design the capability-annotated signal graph language: its type system, its expression of trust tainting, its composition rules. The target is a language expressive enough to encode realistic system architectures while remaining amenable to visual rendering and agent manipulation. Arrowized FRP [11] and algebraic effect systems [31] are the primary formal references. A key design decision is the degree of dependent typing required: Idris 2 or Agda for full expressiveness, or a more restricted system (e.g., a Haskell-like type system with phantom types for trust levels) for tractability. The demonstrator uses the restricted system; the Phase 3 verification work may require the full one.
Runtime prototype. Implement a capability-restricted execution environment (initially WASM/WASI) that instantiates graph nodes with their injected capability objects and provides no ambient authority. Demonstrate that a node implementing a realistic workload (an HTTP handler with database access and LLM invocation) cannot exceed its declared capabilities, and that the security properties of §security hold for a representative set of attack patterns.
Agent tooling. Build the AI agent workflow that takes a natural-language change description, proposes a graph transformation, generates node implementations, and submits for automated verification. This extends existing SDD tooling (OpenSpec, Kiro) to operate on typed graph artifacts rather than prose documents. The distinctive contribution is agents that reason about signal dependencies and project downstream effects of proposed changes before committing them.
Developer experience. Implement the visual graph editor and diff viewer. The primary human interface must make graph transformations reviewable without requiring users to read the underlying type system. Capability edge additions, trust boundary crossings, and sanitisation gaps must be visually salient.
Shallow verification. Develop automated tooling to confirm that AI-generated node implementations satisfy their declared contracts. This combines property-based testing, contract testing, and architectural fitness functions [69], applying existing techniques in a novel configuration. The verification obligation is deliberately bounded: checking type conformance and contract satisfaction, not verifying arbitrary program properties.
Event log infrastructure. Design the structured event logging that capability boundary crossings produce automatically. Define the formal conditions under which replay fidelity holds, characterise the classes of failure (primarily concurrency-dependent) that violate it, and design runtime conventions that maximise fidelity in practice.
Migration path. Design the incremental adoption route for existing systems. The minimal entry point is wrapping an existing service as an opaque node with a declared capability signature, a boundary that describes what the service does (which databases it accesses, which external APIs it calls) without requiring any internal restructuring. This is analogous to declaring a foreign function interface: the existing service runs unchanged, but the graph now models its authority explicitly. Over time, an opaque node can be decomposed: its internal logic is extracted into sub-nodes with narrower capability signatures, progressively tightening the authority model. This incremental path must be designed with attention to intermediate states stable enough for production operation. A system that is half-migrated (some nodes fully capability-restricted, others opaque wrappers) must still provide value: opaque wrappers provide architectural visibility (their authority is explicit and reviewable even before internal restructuring), while full capability enforcement applies only to migrated nodes.
Deep verification. For the compilation from signal graph semantics to capability-restricted WASM, confirm that component boundaries, capability signatures, and trust annotations are preserved across the production boundary. This is a bounded correctness claim about a specific, well-defined transformation, closer in kind to CompCert [70] than to general program verification, but demanding nonetheless. Proof assistants in the tradition of Coq or Lean are the appropriate tools. The minimal set of invariants required to guarantee the security properties of §security in the production runtime must be identified before the full verification work is scoped.
CHERI integration. Design the mapping from architectural capabilities (typed handles injected at node boundaries) to CHERI hardware capabilities at the memory level. Use CHERI’s fine-grained compartmentalisation to enforce node isolation below the WASM boundary. Characterise graceful degradation on non-CHERI hardware. CHERIoT [53] provides a reference architecture; the WASI capability model provides the natural software interface above which CHERI enforcement is applied.
Every element of this proposal has existed in some form for years or decades. What makes the synthesis newly practical is the convergence of four developments.
AI agents as code authors. The primary author of implementation code is, for the first time, an entity that does not require the affordances of text editing and has no resistance to structural constraints. This removes the primary historical obstacle to graph-based code representations.
Lightweight sandboxing. WASM/WASI, Monty, and BEAM have made per-node capability-restricted execution practical at microsecond timescales. The performance overhead that historically limited capability-based systems has been substantially reduced (§sandboxing).
Capability hardware. CHERI processors are reaching commercial availability, and RISC-V standardisation is underway. For the first time since the capability architectures of the 1970s, hardware that enforces unforgeable, bounded capabilities is becoming available for production use.
Economic pressure. Organisations adopting AI coding agents are discovering that unconstrained agents produce architectural drift and security vulnerabilities at unprecedented speed. The emergence of spec-driven development frameworks (OpenSpec [7], Kiro [9], Spec Kit [8]) is itself evidence: these tools exist because agents without structured constraints produce results that outstrip review capacity. The pattern is visible in early evidence of declining code quality metrics coincident with AI assistant adoption [5]. The need for formal architectural constraints on agent-generated code is acute and growing.
If these trends continue, there is a plausible path toward model-driven, capability-restricted, agent-implemented systems. The choice is between deliberate design and haphazard emergence from the collision of existing tools.
This note collects the technical questions that the proposal acknowledges but does not resolve. It is intended for technically specialist readers who may wish to engage with specific open problems.
Compositionality of noninterference. When two well-typed nodes are wired together, the composed system must inherit the noninterference properties of both. This does not follow automatically from local node typing; it requires that the trust label system be compositional in a specific sense. The result is well-established (compositionality of noninterference follows from standard results in information-flow security [24]), but the signal graph wiring model must be shown to satisfy the conditions those results require. Adapting these results to the signal graph’s wiring model is a design and research task for Phase 1, not a mechanical application. It is worth stating explicitly so that the design work does not inadvertently introduce a label system that is locally sound but fails to compose.
The coercion problem in trust-annotated wiring. As discussed in §signal-graph, the trust annotation scheme requires a flow-sensitive wiring type system (beyond local node typing alone) to guarantee noninterference. The precise design (security label lattice, absence of Untrusted<T> <: T subtyping, or a Jif-style label system) is an open design question for Phase 1. The problem is well-understood in the information-flow literature; the contribution is adapting it to the graph wiring context.
Replay fidelity under concurrency. The production replay loop described in §time assumes that the event log at capability boundaries is a complete and deterministic record of the system’s inputs. This holds for single-threaded deterministic nodes but degrades for nodes with internal concurrency or timing dependencies. The formal conditions under which replay fidelity holds, and the runtime conventions that maximise it, are open questions addressed in Phase 2. Perfect replay is not the claim; rather, capability-boundary logging would provide materially better fidelity than conventional logging, and the classes of failure that violate fidelity could be characterised and managed.
Compilation correctness scope. The Phase 3 claim that the FRP-to-WASM compilation preserves capability signatures and trust annotations is a bounded verification obligation, but its exact scope must be defined before proof work begins. CompCert [70] is the appropriate precedent in terms of methodology, but took a decade of dedicated effort. The realistic near-term target is a mechanised proof of preservation for a simplified subset of the signal graph language, sufficient to validate the approach and identify the hard cases, rather than a full production-grade verified compiler.
Error handling and conditional flow. The concrete graph in §concrete-graph demonstrates basic conditional routing (three-way branching from moderation, error routing from response generation). However, real systems require richer patterns: fan-out (sending the same signal to multiple consumers), error propagation chains (a failing node must produce a typed error that downstream nodes can handle), and fallback logic. Arrowized FRP provides combinators for choice and fan-out (ArrowChoice, &&&), but their integration with capability annotations and trust tainting has not been worked out. A graph language that cannot express “on payment failure, notify the user and log the error” without escaping to imperative code would not be viable. This is a Phase 1 design obligation.
Node-local state. The proposal describes nodes as pure functions, but many real components need persistent local state between invocations (session caches, rate-limit counters, accumulated aggregations). In FRP, state is modelled through feedback loops and signal accumulators, but the interaction between stateful signal combinators, capability annotations, and the deterministic replay property of §time has not been analysed. A node that maintains internal state may violate the pure-function assumption that enables the replay and verification claims. The Phase 1 language design must define how state is expressed in the graph (as an explicit feedback edge, as a stateful combinator, or as a capability-mediated external store) and which of these approaches preserves the security and replay properties.
Graph-scale comprehension. The concrete example in §concrete-graph has nine nodes. Real systems will have hundreds. Navigating, debugging, and reviewing large graphs presents UX challenges familiar from visual programming and model-driven environments: clutter, loss of context, difficulty localising errors. Hierarchical decomposition (sub-graphs exposed as single nodes at a higher level, analogous to C4’s zoom levels) is the expected approach, but the interaction between hierarchical abstraction, capability wiring, and trust propagation across sub-graph boundaries has not been designed. A type error deep in a sub-graph’s internal wiring must produce an error message that is comprehensible at the level the developer is working at. This is a Phase 1 developer experience obligation.
Graph evolution and signature compatibility. When a node’s type signature changes, downstream consumers may break. The graph language must define compatibility rules for signature evolution (additive changes, narrowing of capability requirements, widening of output types) and support versioned interfaces at sub-graph boundaries to enable independent team ownership. This is a standard API evolution problem, but its interaction with capability wiring and trust annotations adds constraints that the Phase 1 design must address.
Type system soundness. The security properties of §security depend entirely on the type system being sound: every well-typed graph must satisfy noninterference and capability confinement. A soundness bug in the type system would propagate to every layer of the defence-in-depth stack (§runtime), since the runtime’s capability injection is configured by the type system’s analysis. Phase 1 should include property-based testing of the type system (random graph generation with expected type errors, fuzzing of the wiring checker) as a lightweight validation before Phase 3’s mechanised soundness proof for the core calculus.
Distributed ambient authority. The signal graph model controls capability flow within a deployment, but a node wired to a network capability handle can, in principle, communicate with any reachable service, potentially acquiring capabilities out-of-band that the graph does not model. The E language [23] and Agoric’s Hardened JavaScript address this through reference-based communication discipline; our proposal inherits the same open question for the distributed case. Scoping the system to a single deployment boundary for Phase 1 and Phase 2 is the pragmatic approach; the distributed extension is a later-phase research question.
Type system evolution. The graph language’s own type system will need to evolve: new trust levels, new capability kinds, refined subtyping rules. This is distinct from the graph evolution problem above (which addresses node signature changes within a fixed type system). Migration of existing graphs across type system versions, and the preservation of verified properties across such migrations, is an open problem that the Phase 1 design should anticipate, drawing on established approaches to language versioning and gradual typing.
This proposal is an invitation. The synthesis it describes spans several domains that no single team is likely to cover. This annex identifies the expertise each phase requires, as a guide for potential collaborators.
Type theory and functional programming: algebraic type systems, arrowized FRP, algebraic effect systems. Experience with Haskell, Idris, or Agda. The signal graph’s type system is the foundation; getting it wrong here propagates to every later phase.
Systems programming: WASM/WASI toolchains, capability-based I/O models, runtime implementation in Rust or C++.
AI agent tooling: structured agent workflows, tool use, MCP (Model Context Protocol).
Developer experience design: visual graph editors, diff viewers, reviewer cognitive load. This is as important as the formal foundations.
Formal methods: property-based testing (QuickCheck, Hypothesis), contract testing, lightweight specification (TLA+, Alloy).
Distributed systems and observability: structured logging, distributed tracing, causal ordering [71], OpenTelemetry.
Security engineering: capability-based security, supply chain threat models, prompt injection as an attack class.
Proof assistants: Coq or Lean 4 at theorem-proving level, for the bounded compilation correctness claim.
Computer architecture: CHERI instruction set architecture (ISA) extensions, CHERIoT hardware-software co-design. The Cambridge CHERI group is the primary external knowledge source.