Providers
What is a Provider?
A provider is a small executable that:
- Speaks to specific hardware (sensors, actuators, PLCs, etc.)
- Exposes devices via ADPP (Anolis Device Provider Protocol)
- Runs as an isolated process (stdio communication)
Provider Protocol (ADPP)
Communication via stdin/stdout with uint32 length-prefixed protobuf messages.
Message Flow
Runtime ────┐
│ Request (protobuf, length-prefixed)
▼
Provider
│
▼ Response (protobuf, length-prefixed)
Runtime ◄───┘
ADPP Operations
Required for baseline runtime integration:
- Hello: Provider identifies itself
- ListDevices: Returns device IDs
- DescribeDevice: Returns capabilities (signals, functions)
- ReadSignals: Returns current signal values
- Call: Executes function (e.g., set_relay, move_motor)
Optional but recommended for production providers:
- GetHealth: Reports provider and per-device health
- WaitReady: Signals provider initialization readiness
See external/anolis-protocol/spec/device-provider/protocol.proto for schema.
Example: anolis-provider-sim
Simulated provider with:
- tempctl0: Temperature controller (2 relays, temp/humidity sensors)
- motorctl0: Motor controller (speed, position, status)
Source: https://github.com/FEASTorg/anolis-provider-sim
Key files:
src/main.cpp: stdio framing + message dispatchsrc/handlers.cpp: ADPP operation handlerssrc/sim_devices.cpp: Device state simulation
Creating a Provider
Minimal Steps
- Pick a language: C++, Rust, Python - anything that can do stdio + protobuf
- Implement framing: Read/write uint32_le length prefix
- Implement ADPP handlers used by runtime: Hello, ListDevices, DescribeDevice, ReadSignals, Call (plus WaitReady/GetHealth when supported)
- Handle hardware: Your code, your protocol (Modbus, SPI, etc.)
Provider Template (pseudocode)
while (true) {
Request req = read_framed_stdin();
Response resp;
switch (req.type) {
case HELLO:
resp = handle_hello();
break;
case LIST_DEVICES:
resp = handle_list_devices();
break;
case DESCRIBE_DEVICE:
resp = handle_describe(req.device_id);
break;
case READ_SIGNALS:
resp = handle_read(req.device_id, req.signal_ids);
break;
case CALL:
resp = handle_call(req.device_id, req.function_id, req.args);
break;
}
write_framed_stdout(resp);
}
Rules
- Stateless preferred: Runtime caches state, you just read hardware
- Blocking OK: Runtime handles concurrency
- Crash = unavailable: Runtime marks devices offline (supervision may restart)
- No stdin spam: Only respond to requests
- Quality matters: Report STALE/FAULT when hardware fails
Provider Supervision
The runtime can automatically monitor and restart crashed providers:
providers:
- id: hardware
command: ./my-provider
restart_policy:
enabled: true
max_attempts: 3
backoff_ms: [200, 500, 1000]
timeout_ms: 30000
Crash Detection
The supervisor detects provider crashes when:
- Process exits unexpectedly
- ADPP operations timeout repeatedly
- Provider becomes unresponsive
Restart Flow
- Crash Detected: Supervisor logs crash with attempt counter
- Backoff Wait: Delays restart according to
backoff_ms[attempt - 1] - Device Cleanup: Clears all devices from registry before restart
- Process Restart: Spawns new provider process
- Device Rediscovery: Runs Hello → ListDevices → DescribeDevice for each device
- Recovery Tracking: Resets crash counter on successful restart
Circuit Breaker
After max_attempts consecutive crashes, the circuit breaker opens:
- No further automatic restarts
- Devices remain unavailable
- Manual intervention required (restart runtime or fix provider)
The circuit breaker resets when the provider successfully recovers.
Supervision Observability
The runtime exposes real-time supervision state for every provider via GET /v0/providers/health.
GET /v0/runtime/status intentionally stays coarse (AVAILABLE/UNAVAILABLE) for compatibility;
use /v0/providers/health for restart/backoff detail.
Key fields per provider:
| Field | Description |
|---|---|
lifecycle_state |
Additive lifecycle signal: RUNNING, RECOVERING, RESTARTING, CIRCUIT_OPEN, DOWN. |
last_seen_ago_ms |
Milliseconds since the last healthy poll. Counts up while UNAVAILABLE. null before first poll. |
uptime_seconds |
Seconds since the first healthy poll of the current process instance. 0 when UNAVAILABLE. |
supervision |
Supervision block — always an object, never null (even when policy is disabled). |
Supervision state at each lifecycle stage:
| Stage | state |
lifecycle_state |
attempt_count |
circuit_open |
next_restart_in_ms |
|---|---|---|---|---|---|
| Running normally | AVAILABLE |
RUNNING |
0 |
false |
null |
| Available but stabilizing | AVAILABLE |
RECOVERING |
> 0 |
false |
0 or null |
| Crashed, in backoff | UNAVAILABLE |
RESTARTING |
>= 1 |
false |
positive integer (countdown) |
| Restart eligible now | UNAVAILABLE |
RESTARTING |
>= 1 |
false |
0 |
| Circuit open (max exceeded) | UNAVAILABLE |
CIRCUIT_OPEN |
> max_attempts |
true |
null |
| Down (no restart metadata) | UNAVAILABLE |
DOWN |
0 |
false |
null |
Distinguishing the two null cases for next_restart_in_ms:
next_restart_in_ms is null in two situations: healthy (no crash) and circuit-open (no more restarts).
Always read circuit_open to tell them apart.
Example — polling supervision from a script:
# Wait until provider is available
until curl -sf http://127.0.0.1:8080/v0/providers/health | \
jq -e '.providers[] | select(.provider_id=="sim0") | .state == "AVAILABLE"' > /dev/null; do
sleep 0.5
done
# Check if circuit is open
curl -s http://127.0.0.1:8080/v0/providers/health | \
jq '.providers[] | select(.provider_id=="sim0") | .supervision'
See HTTP API Reference - GET /v0/providers/health
for the full field reference including next_restart_in_ms disambiguation.
Backoff Strategy
The backoff_ms array defines delays before each restart attempt:
# Conservative: Long delays for stable hardware
backoff_ms: [1000, 3000, 5000]
# Aggressive: Quick recovery for transient issues
backoff_ms: [100, 200, 500]
# Production: Balanced approach
backoff_ms: [200, 500, 1000]
Best Practices
- Enable for production providers: Hardware can fail, supervision ensures resilience
- Disable for development: Crashes during development should stop execution for debugging
- Tune backoff delays: Match your hardware’s restart characteristics
- Set reasonable max_attempts: Avoid infinite restart loops for permanently failed hardware
- Monitor circuit breaker: Alert when circuit opens (indicates persistent provider failure)
Provider Internal State (Important)
Providers may maintain ephemeral protocol state required by hardware:
- Multi-step read sequences (e.g., CRUMBS staged reads: select → fetch)
- Communication buffers
- Hardware-specific state machines
Critical boundary:
- ✅ Provider internal state: Protocol implementation details
- ✅ Core single source of truth: Machine state visible to rest of system (StateCache)
- ❌ Never: Expose provider state directly to UIs/automation
Example: CRUMBS provider may buffer staged reads internally, but Anolis core remains authoritative for what the “current temperature” is.
Provider Examples (Planned)
anolis-provider-modbus: Modbus RTU/TCP devicesanolis-provider-arduino: Arduino via serialanolis-provider-canbus: CAN bus devicesanolis-provider-crumbs: FEAST CRUMBS integrationanolis-provider-ni: National Instruments DAQ
Testing Your Provider
Use anolis-runtime:
# anolis-runtime.yaml
providers:
- id: my_provider
command: /path/to/my-provider
args: ["--port", "/dev/ttyUSB0"]
Run and check logs for discovery, polling, and control operations.
Safe Initialization Contract
Critical Safety Principle: Providers MUST initialize devices in a safe, inactive state on startup. This ensures physical safety during runtime startup, configuration changes, and recovery scenarios.
Provider Responsibilities
Providers must guarantee that when the process starts:
- No Actuation: All actuators start in their safe, inactive position
- No Heating/Cooling: Thermal controls start disabled
- No Motion: Motors, stages, and moving parts start stationary
- No State Assumptions: Don’t assume hardware is already in a safe state
- Hardware Verification: Query current hardware state and command it to safe defaults if needed
Safe Default Definition
A safe default is the state a device should be in when:
- System is powered but not operating
- No automation is running
- No operator is actively controlling equipment
- Recovery from an error condition is in progress
Example Safe States by Device Type
| Device Type | Safe State | Implementation Notes |
|---|---|---|
| Relay/Switch | Open (de-energized) | Fail-safe: loss of power = safe |
| Motor Controller | Duty cycle = 0, disabled | Both PWM and enable signals off |
| Temperature Control | Open-loop mode, heaters off | Monitor-only until explicitly enabled |
| Linear Actuator | Position hold or retract to home | Depends on mechanical fail-safe design |
| Valve | Closed (or safe position for system) | May be normally-open or normally-closed per hardware |
| Laser/Light Source | Disabled, shutter closed | Both electronic disable and mechanical safety |
| Vacuum Pump | Off | Vent valve open if vacuum retention is unsafe |
| Pressure Regulator | Vent position, zero setpoint | Depressurize system unless hold is explicitly safe |
| High Voltage Supply | Output disabled, voltage = 0 | Hardware-level disable, not just software |
| Communication Bridge | Pass-through disabled | Don’t forward commands until runtime confirms ready |
Verification Checklist
Before deploying a provider, verify:
- Startup behavior validated: Provider process starts with no side effects
- Hardware queried: Current state is read before any commands issued
- Safe state commanded: Explicit commands sent to hardware to ensure safe defaults
- Power-on-reset tested: Provider works correctly after hardware power cycle
- Crash recovery tested: Provider restart doesn’t cause unsafe transitions
- Configuration validated: Invalid config fails gracefully without actuating hardware
- Emergency stop path: Provider can be terminated safely at any time
Runtime Integration
The anolis runtime coordinates safe startup:
- Runtime starts in IDLE mode (control operations blocked)
- Providers initialize (safe defaults applied)
- Device discovery runs (capabilities advertised)
- Operator transitions to MANUAL (enables control operations)
- Verification procedures run (confirm safe operation)
- Operator transitions to AUTO (enables automation)
This sequence ensures no actuation occurs until:
- Hardware is in a known safe state
- Operator has verified expected behavior
- Runtime mode explicitly permits operations
Common Pitfalls
| Pitfall | Consequence | Mitigation |
|---|---|---|
| Assuming hardware is already safe | Startup after crash/reboot may find actuators enabled | Always command safe state explicitly |
| Not querying current state | May miss hardware faults or unexpected conditions | Read before write; log discrepancies |
| Relying on hardware power-on defaults | PCB redesign or component change breaks assumptions | Verify safe state in provider code |
| Skipping safe init for “read-only” devices | Misconfiguration could enable hidden control paths | Always safe-init; hardware may have undocumented features |
| Using config file for safety-critical settings | Config typo or version mismatch → unsafe state | Hard-code safe defaults; config only for non-critical params |
Example: anolis-provider-sim Compliance
See FEASTorg/anolis-provider-sim > README > safe-initialization-in-provider-sim for reference implementation.
Provider Isolation Benefits
- Crash safety: Provider crash doesn’t kill runtime
- Language freedom: Use best tool for hardware
- Security: No shared memory, limited blast radius
- Testing: Mock providers for CI/CD