Overview
This is a high-level overview of how differenet parts of the Serene's compiler work together. It is meant to be the entry point for developers who wish to understand the the internals.
Serene is going to be a self-hosted compiler — its real compiler will be written in Serene itself. To get there we first need to bootstrap, and that is what we are building right now: the stage-0 compiler, a throwaway whose only job is to compile enough of Serene to write the self-hosted compiler in.
Forward-looking. This describes the intended architecture, assuming the in-progress pieces are finished. Several parts — most of the backend, the compiler↔runtime bridge, and large chunks of the type checker — are still under construction today. Read this for the shape of the system, not its current state. Sections that lean on unfinished work are marked (in progress).
The shape of stage 0
The stage-0 compiler (lscz, "lxsameer's serene compiler") is the stage-0 compiler, its only job is to compile enough of Serene that Serene can be written in itself. It has two halves — a front end that produces well-typed core terms, and a runtime those terms ultimately run on. The bridge between them is codegen plus a small ABI. The following diagram captures the big picture.
flowchart TD
src(["Serene Source"]) --> parser
subgraph FE["Front-end · lscz"]
direction TB
parser("Parser") --> elab
subgraph GR["Graph reduction"]
direction LR
elab("Elaborator") --> type("TypeChecker") --> core("Typed Core · QTT")
end
end
subgraph RT["Runtime"]
direction TB
subgraph JIT["JIT"]
direction LR
eval("Evaluate") --> llvm("LLVM Backend")
end
llvm --> value[("Values")]
mm("Memory Manager") <--> value
ds("Data Structures") <--> value
fiber("Fiber Subsystem") <--> value
io("IO Reactor") <--> fiber
ffi("FFI") <--> value
end
core ==> eval
value -. "read back" .-> type
llvm --> prog[/"Executable"/]
ffi --> world(["World"])
%% Serene palette: purple front-end bands, amber runtime bands, purple hub.
classDef feBand fill:#f1eaf3,stroke:#7c3a8f,color:#1d141f
classDef grBand fill:#e4d4ec,stroke:#5e246d,color:#1d141f
classDef rtBand fill:#fff3d6,stroke:#cf9526,color:#241c10
classDef jitBand fill:#ffe7b0,stroke:#b5752a,color:#241c10
classDef hub fill:#5e246d,stroke:#431950,color:#ffffff,stroke-width:2px
classDef proc fill:#ffffff,stroke:#7c3a8f,color:#1d141f
classDef term fill:#faf8fb,stroke:#6a5f6e,color:#1d141f
class FE feBand
class GR grBand
class RT rtBand
class JIT jitBand
class value hub
class parser,elab,type,core,eval,llvm,mm,ds,fiber,io,ffi proc
class src,prog,world term
The front end: lscz
The front end is written in Idris2 and is a made up of small, and total passes that form a graph reduction pipeline. It reads the Serene code via Serene.Reader, stores the syntactically correct forms in a graph (Serene.Graph) and run the nodes through the pipeline on demand.
Forms -> Highlevel language -> Elaborate -> Well typed core TT -> Type cheker -> Core Terms
It's pretty streight-forward on the surface. But each of the passes have their own level of complexity. You can find out more by reading through the lscz API Reference
The front-end works with the runtime to compile the core terms to values, and read them back if necessary for type checking or macro expansion.
That being said, lscz is designed to support to have different backends, for example it has a simple interpretation backend that does not go through the JIT compiler.
The Runtime
The runtime is a static C library (libserene.runtime.a) that compiled Serene code links against. It owns everything that has to exist at run time, and provides support for the the programs. It have many different components, and subsystems such as:
Object model & data structures. Serene values are immutable. The runtime ships persistent collections — cons lists, a vector-trie
seq, and a HAMT-backedmap— so "updates" share structure instead of copying.Memory Manager A pluggable memory manager system that supports different implementations. Allocation goes through a block-based arena allocator by default but it is easy to hook a garbage collector into it as well. (
mm)Concurrency. The runtime has stackful fibers multiplexed over OS threads by an M:N work-stealing scheduler, with an IO reactor for non-blocking IO. All the programs that run on the runtime, will use the fiber subsystem by default.
Execution. A JIT runs code at compile time (for the evaluator). It utilizes LLVM ABI/FFI layer lets Serene call C and vice versa.
See the runtime API reference for the concrete types and functions.
Connecting the two halves
The front end produces typed core terms; the runtime knows how to hold values and run code. Two paths bridge them, and they share one value representation:
- Compile-time evaluation. Normalization during type checking needs to actually reduce terms. The interpreter (and, for speed, the runtime's JIT) evaluates core terms into runtime values, which flow back into the graph.
- Code generation (in progress). The backend (
Serene.Compiler.Backend) lowers fully-checked core terms to native code (via the runtime API). Emitted code is just calls into the runtime's object model, data structures, and scheduler.
Thanks to the type checker, what reaches the runtime is ordinary first-order code over the runtime's value model — no types travel to runtime.