Free Monad, Hardware Decoupling, and Performance Constraints

Status: Conceptual Design & Future Optimization Roadmap Context: This document captures the architectural reasoning regarding the decoupling of algorithms from hardware using Free Monads, and addresses the eventual transition from high-level functional abstractions (using standard libraries like std::vector) to performance-constrained embedded environments (RTOS, MCU).

1. The Core Abstraction: AST, Free Monad, and Interpreter

The current architecture employs a strict separation between what to do (Strategy/Algorithm) and how to do it (Execution/Hardware).

1.1 `FccOp` and the Abstract Syntax Tree (AST)

Instead of executing hardware commands directly, the flight control strategy constructs a purely functional data structure—an Abstract Syntax Tree (AST). The "instruction set" for this tree is defined in FccOp. It contains semantic labels like ReadIMU or OutputControls.

Crucial Architecture Note: FccOp should represent I/O and Side Effects only. Pure algorithms (like Navigation or Guidance) are mathematically pure C++ functions, not interpreter instructions. The Free Monad orchestrates passing the data fetched by FccOp into these pure functions.

1.2 Free Monad (The Script)

The Free Monad (FccFree / FccDSL) allows algorithm engineers to write strategies that look like normal imperative code but actually just build the AST. This represents the "One Codebase" promise: This script executes zero hardware actions and allocates no global side-effects, making it 100% portable across Windows/Linux simulations and embedded RTOS.

1.3 The Interpreter (The Actor)

The Interpreter maps the AST nodes (FccOp) to physical reality.

PC/Simulation Interpreter: Maps ReadIMU to reading simulated memory buffers (FccState::last_imu).
Hardware Interpreter (Future): Maps ReadIMU to actual SPI/I2C bus calls on an MCU.

2. Hardware Constraints vs. Functional Purity

Currently, the implementation prioritizes clean abstractions, heavily utilizing modern C++ standard libraries (std::vector, std::variant, std::any, std::function) and functional paradigms (creating new states rather than mutating old ones).

However, when deploying to actual flight hardware with severe constraints (limited SRAM, absence of MMU, strict L1/L2 Cache sizes, and narrow memory bandwidth), pure functional programming can cause catastrophic "Cache Thrashing" due to massive memory copies and heap allocations.

When the time comes to optimize for hardware, the following paradigms will be enforced without breaking the high-level functional abstraction:

2.1 "Pure Outside, Imperative Inside"

Functional purity only requires that a function's outputs depend solely on its inputs, with no external side-effects. It does not dictate how the inside of the function is implemented.

Current/Abstract Phase: Using std::vector and deep copies to prove the mathematical model.
Hardware Phase: Refactoring pure algorithmic functions (like run_navigation_algo) to use flat, stack-allocated arrays (float data[9]) and highly imperative loop unrolling internally. This maximizes Cache Locality and Prefetcher efficiency.

2.2 Zero-Copy State Evolution (RVO / NRVO)

In a functional architecture, evolving the flight state looks like a massive copy: NavState new_nav = compute(old_nav);. To avoid bandwidth saturation, we rely on modern C++ Return Value Optimization (RVO/NRVO). The compiler will allocate the new_nav memory in the caller's frame and pass a hidden pointer. The "pure" function effectively does an in-place mutation on that memory, resulting in zero overhead while maintaining logical immutability.

2.3 Global State "Ping-Pong" (Double Buffering)

Passing the massive top-level FccState through the Free Monad loop cannot be done via copy. For the RTOS hardware interpreter, the system will statically allocate exactly two FccState buffers in high-speed SRAM (or TCM - Tightly Coupled Memory). The interpreter will use pointer swapping ("Ping-Pong") at the end of each tick. The Strategy DSL inputs State A and writes to State B, and next tick they swap. This achieves purely functional state evolution with absolute $O(1)$ zero-copy hardware performance.

2.4 Stripping the Interpreter Overhead

To eliminate heap allocations (e.g., from std::any or lambda closures) and stack overflows from recursive monad trampolines on an MCU:

No-Heap Guarantee: FccOp variants will be handled using static sizing (or union types), and return types will be statically resolved via template metaprogramming rather than type-erasure (std::any).
Trampoline Unrolling: The recursive interpretation of the Free Monad will be flattened into a while(true) event-loop within the RTOS task, ensuring optimal Instruction Cache (I-Cache) utilization.

Summary

The current focus on abstraction, std::vector, and functional pipelines is the correct path for achieving mathematical correctness and business-logic decoupling. The physical realities of MCU memory bandwidth and cache design do not require abandoning this architecture; rather, they will be addressed at the Interpreter layer and within the internal implementation details of the pure functions using C++20 memory/optimization techniques.

Free Monad, Hardware Decoupling, and Performance Constraints ​

1. The Core Abstraction: AST, Free Monad, and Interpreter ​

1.1 FccOp and the Abstract Syntax Tree (AST) ​

1.2 Free Monad (The Script) ​

1.3 The Interpreter (The Actor) ​

2. Hardware Constraints vs. Functional Purity ​

2.1 "Pure Outside, Imperative Inside" ​

2.2 Zero-Copy State Evolution (RVO / NRVO) ​

2.3 Global State "Ping-Pong" (Double Buffering) ​

2.4 Stripping the Interpreter Overhead ​

Summary ​