Architecture
Overview
FluxGraph is a deterministic signal processing library designed for real-time simulation. This document explains the key architectural decisions, execution model, and design principles.
Design Goals
- Determinism - Same inputs always produce same outputs
- Performance - Low overhead for real-time execution (<10ms ticks)
- Modularity - Easy to add transforms/models
- Zero Dependencies - Pure C++17, no external libs
- Type Safety - Strong typing, compile-time checks where possible
Core Architecture
Signal-Centric Design
All data flows through signals identified by integer handles (SignalId):
[External Input] -> SignalStore -> [Graph Processing] -> SignalStore -> [External Output]
Benefits:
- Fast lookups (array indexing, not map searches)
- Simple synchronization (snapshot-based)
- Clear data ownership (SignalStore is single source of truth)
Three-Layer Model
+------------------+
| Application | User code sets inputs, reads outputs
+------------------+
|
+------------------+
| Engine | Orchestrates execution
+------------------+
|
+------------------+
| Graph Layer | Transforms, Models, Rules
+------------------+
|
+------------------+
| SignalStore | Central data repository
+------------------+
Layer responsibilities:
- Application: Drives ticks, manages time
- Engine: Executes five-stage pipeline, manages execution state
- Graph: Implements data flow logic (transforms, physics)
- SignalStore: Holds current simulation state
Five-Stage Tick Execution
Each engine.tick(dt, store) executes five sequential stages:
Stage 1: Pre-Tick Snapshot
snapshot = store.snapshot();
Purpose: Create consistent read view of all signals
Why? Without snapshot, transforms/models would see partially-updated values during execution, breaking determinism. Example problem:
Transform A: output1 = input1 * 2
Transform B: output2 = output1 + 1
Without snapshot:
- Transform A runs, writes output1
- Transform B runs, reads NEW output1 (data race!)
With snapshot:
- Both read from snapshot (old values)
- Both write to store (new values)
- No data races, deterministic order
Stage 2: Model Tick
for (auto& model : models) {
model->tick(dt, store);
}
Purpose: Update physics models
Why first? Models represent βground truthβ physics that other stages observe. Temperature, position, pressure, etc. are physics-driven, not data-flow-driven.
Models write directly to SignalStore with physics_driven=true flag. These signals are owned by models, not computed by transforms.
Stage 3: Edge Execution
for (auto& edge : sorted_edges) {
double input = snapshot.read(edge.source_id);
double output = edge.transform->apply(input, dt);
store.write(edge.target_id, output, edge.unit);
}
Purpose: Execute transform chains in topological order
Topological sort ensures:
- Dependencies execute before dependents
- No signal read before itβs written
- Example: If C depends on B depends on A, order is A->B->C
Why after models? Transforms process model outputs (e.g., filtering sensor data).
Stage 4: Rule Evaluation
for (auto& rule : rules) {
if (rule.evaluate(store)) {
commands.push(rule.emit_command());
}
}
Purpose: Evaluate conditions and emit commands
Why after edges? Rules operate on processed signals (post-filtering, post-transform).
Example: Emergency stop based on filtered temperature, not raw sensor.
Stage 5: Post-Tick Write
(Implicit - all writes already committed to store)
Purpose: Store now contains new state for next tick
No explicit stage needed - writes happen during stages 2-4.
Topological Sort Algorithm
Goal: Order edges so dependencies execute before dependents
Kahnβs Algorithm
1. Compute in-degree for each node (count of incoming edges)
2. Start with all nodes that have in-degree 0 (no dependencies)
3. Process queue:
a. Pop node from queue
b. Add to sorted list
c. For each outgoing edge:
- Decrement target node's in-degree
- If now zero, add target to queue
4. If sorted list size < node count, graph has cycle
Example:
Input graph:
A -> B -> C
A -> C
In-degrees: A=0, B=1, C=2
Step 1: Queue=[A], Sorted=[]
Step 2: Process A, Queue=[B], Sorted=[A]
Step 3: Process B, Queue=[C], Sorted=[A,B]
Step 4: Process C, Queue=[], Sorted=[A,B,C]
Execution order: A, B, C
Complexity: O(V + E) where V=signals, E=edges
Implementation: See compiler.cpp:109-153
Cycle Detection
Problem: Cyclic graphs have no valid topological order
A -> B -> C -> A (cycle)
Detection: If topological sort produces fewer nodes than graph contains, cycle exists.
Why prohibit cycles?
- No well-defined execution order
- Bootstrap problem (who computes first value?)
- Algebraic loops require iterative solvers (out of scope)
User-facing error:
Error: Cycle detected in signal dependencies
Signals involved: device.sensor1 -> device.filter -> device.sensor1
Future work: Could support cycles with explicit fixpoint iteration, but adds complexity.
Numerical Stability
Problem
Physics models use numerical integration. If time step too large, integration diverges.
Example: Thermal mass with forward Euler integration:
T_{n+1} = T_n + dt * (P - h*(T_n - T_amb)) / C
Stability condition: dt < 2*C/h
If violated, temperature oscillates wildly.
Solution
Models implement compute_stability_limit():
double ThermalMassModel::compute_stability_limit() const {
return 2.0 * m_thermal_mass / m_heat_transfer_coeff;
}
GraphCompiler validates during compilation:
if (dt > model.compute_stability_limit()) {
throw std::runtime_error("Time step too large for model stability");
}
User guidance: If compilation fails, either:
- Reduce dt (better accuracy anyway)
- Use implicit integration (future enhancement)
Design Decisions
Why POD GraphSpec?
Choice: GraphSpec is Plain Old Data (vectors of structs)
Alternatives considered:
- Builder pattern (e.g.,
graph.addEdge(...)) - Fluent API (e.g.,
graph.edge("A", "B").withTransform(...))
Rationale:
- POD is serializable (JSON, YAML, Protobuf)
- Clear separation: spec=data, compiler=logic
- Easier to generate programmatically
- Can validate before compilation
Why Single-Writer SignalStore?
Choice: No mutexes, single-writer model
Alternative: Thread-safe store with locking
Rationale:
- Real-time systems avoid unpredictable lock contention
- Single tick thread is simpler mental model
- Readers can still access safely (read-only)
- Performance: No lock overhead
Trade-off: Cannot parallelize edge execution. For now, acceptable (tick <10ms target easily met). Future: Could shard SignalStore by signal ranges.
Why Integer Signal IDs?
Choice: SignalId is uint32_t, not string
Alternative: Use std::string paths everywhere
Rationale:
- O(1) array lookup vs O(log n) map lookup
- Cache-friendly (contiguous IDs -> contiguous storage)
- Validation at compile time (namespace creation)
- SignalId fits in CPU register
Trade-off: Two-step access (intern then use). Mitigated by caching IDs.
Why No Virtual Functions in Transforms?
Wait, we DO use virtual functions! ITransform is interface with virtual methods.
Decision DID consider:
- Function pointers (C-style)
- std::function (type erasure)
- Virtual functions (C++ inheritance)
Chose virtual functions because:
- Clean interface (ITransform)
- Easy to extend (inherit and implement)
- Compiler can devirtualize in many cases
- Slightly slower than function pointers, but cleaner code
Performance: Virtual call ~2-5ns overhead. Transform math dominates (exp, sqrt, etc.)
Why Snapshot-Based Execution?
Choice: Copy all signal values before processing
Alternatives:
- Double-buffering (two SignalStores, swap)
- Dependency tracking (only snapshot whatβs needed)
- Immediate propagation (write-through)
Rationale:
- Simplest to reason about (no race conditions)
- Deterministic (execution order doesnβt affect output)
- Easy to debug (snapshot is frozen state)
Trade-off: O(n) memory copy per tick. For 1000 signals, ~8KB copy (negligible).
Future optimization: Lazy snapshot (copy-on-write), but premature optimization for now.
Memory Layout
SignalStore Internals
class SignalStore {
private:
std::vector<Signal> m_signals; // Contiguous array
};
Access pattern:
SignalId=5 -> m_signals[5] (O(1) array index)
Cache implications:
- Sequential iteration is cache-friendly
- Random access still fast (array, not map)
Memory: Approx 24 bytes per signal (value + unit + flags)
Namespace Internals
Two data structures for O(1) in both directions:
class SignalNamespace {
private:
std::unordered_map<std::string, SignalId> m_path_to_id;
std::unordered_map<SignalId, std::string> m_id_to_path;
};
intern(): O(1) average (hash map insert) resolve(): O(1) average (hash map lookup) lookup(): O(1) average (reverse map)
Memory: ~32 bytes per signal (two map entries)
Extension Points
Custom Transforms
Implement ITransform:
class MyTransform : public ITransform {
double apply(double input, double dt) override { ... }
void reset() override { ... }
std::unique_ptr<ITransform> clone() const override { ... }
};
Register in factory:
transform_factory.register_type("my_transform",
[](const ParamMap& p) { return std::make_unique<MyTransform>(p); });
See EMBEDDING.md for details.
Custom Models
Implement IModel:
class MyModel : public IModel {
void tick(double dt, SignalStore& store) override { ... }
void reset() override { ... }
double compute_stability_limit() const override { ... }
std::string describe() const override { ... }
std::vector<SignalId> output_signal_ids() const override { ... }
};
Register similar to transforms.
Custom Rules
(Future feature) Implement IRule interface (not yet designed).
Testing Strategy
Unit Tests
- Test each component in isolation
- Mock dependencies (e.g., FakeSignalStore)
- Focus: Correctness of individual transforms/models
Location: tests/unit/
Analytical Tests
- Validate numerical accuracy against analytical solutions
- Example: FirstOrderLag vs exp(-t/tau)
- Focus: Scientific correctness
Location: tests/analytical/
Integration Tests
- Test full engine execution
- Multi-model interactions
- Determinism validation
- Focus: System behavior
Location: tests/integration/
Benchmarks
- Measure performance (throughput, latency)
- Detect regressions
- Focus: Real-time viability
Location: tests/benchmarks/
Note: Run benchmarks in Release mode (-O3). Debug builds are 10-100x slower.
Performance Characteristics
Target: 1000-signal graph in <10ms per tick
Breakdown (Rough estimates for Release build)
| Operation | Time | Notes |
|---|---|---|
| Snapshot copy | 10 us | 1000 signals * 10ns |
| Model tick | 1-5 ms | Depends on model complexity |
| Edge execution | 50-500 us | 1000 edges * 50ns-500ns |
| Rule evaluation | 10-100 us | Simple conditionals |
| Total | 1-6 ms | Well under 10ms budget |
Bottlenecks:
- Model physics (exp, sqrt, etc.)
- Transform math (less expensive)
- Memory access (cache misses)
Not bottlenecks:
- Virtual function calls (~2ns overhead)
- Namespace lookups (done at compile time)
- Command queue operations (rare)
Future Enhancements
Parallel Edge Execution
Independent edges could execute concurrently:
Level 0: [A, B, C] (no dependencies, run in parallel)
Level 1: [D, E] (depend on level 0)
Level 2: [F] (depends on D, E)
Benefit: 2-4x speedup for wide graphs Challenge: Thread pool overhead, synchronization
Implicit Integration
Replace forward Euler with backward Euler or RK4:
Benefit: Larger stable time steps Challenge: Iterative solvers (Newton-Raphson)
YAML/JSON Parsing
Load GraphSpec from file:
edges:
- source: device.sensor
target: device.filtered
transform:
type: first_order_lag
params:
tau_s: 1.0
Benefit: No recompilation for graph changes Challenge: Adds yaml-cpp or nlohmann-json dependency
GPU Acceleration
Offload transform chains to GPU:
Benefit: 10-100x speedup for massively parallel graphs Challenge: GPU synchronization, API complexity
Comparison to Other Frameworks
vs. Simulink
- Simulink: Visual block diagrams, MATLAB integration, expensive
- FluxGraph: Code-first, C++17, free, embeddable
vs. OpenModelica
- OpenModelica: Equation-based modeling, DAE solvers, heavy
- FluxGraph: Signal-flow, explicit integration, lightweight
vs. ROS
- ROS: Distributed pub/sub, process-based, high latency
- FluxGraph: In-process, deterministic, low latency
vs. Custom C Code
- Custom: Fully flexible, hard to maintain, error-prone
- FluxGraph: Structured, testable, modular
FluxGraph niche: Embedded real-time simulation where determinism and performance matter more than visual design tools.
Code Organization
fluxgraph/
βββ include/fluxgraph/
β βββ core/ SignalStore, Namespace, Command
β βββ transform/ 8 built-in transforms + interface
β βββ model/ ThermalMass + interface
β βββ graph/ GraphSpec, Compiler
β βββ engine.hpp Five-stage execution engine
βββ src/ Implementation files
βββ tests/
β βββ unit/ Component tests
β βββ analytical/ Scientific validation
β βββ integration/ System tests
β βββ benchmarks/ Performance tests
βββ examples/ Usage examples
βββ docs/ Documentation (you are here)
Header-only vs. library:
- Most code in headers for template flexibility
- Heavy logic (compiler, models) in .cpp for compile time
Build System
CMake 3.20+, C++17 standard:
add_subdirectory(fluxgraph)
target_link_libraries(your_app PRIVATE fluxgraph)
Zero external dependencies (except GoogleTest for tests).
Platform support:
- Windows (MSVC 2019+)
- Linux (GCC 9+, Clang 10+)
- macOS (Xcode 12+)
Summary
FluxGraph achieves high-performance deterministic simulation through:
- Signal-centric architecture - Integer IDs, array storage
- Five-stage tick model - Snapshot isolation, ordered execution
- Topological sort - Dependency-aware scheduling
- Modular design - Easy to extend transforms/models
- Scientific validation - Analytical tests for correctness
Result: Sub-10ms ticks for 1000-signal graphs with zero dependencies.
Next: See TRANSFORMS.md for transform details, EMBEDDING.md for integration guide.