A Small ML for Effects-Native Hardware — Language Spec v0.3
Status
This is v0.3 of the language spec. Resolutions to several v0.1 and v0.2 open questions are now pinned. New decisions surfaced by the compiler team’s implementation work are incorporated.
This document remains normative for the compiler implementation and informative for the kernel and library implementations.
The language is unnamed for now. Working name: CML (Concrete ML).
What changed from v0.1
- OPEN 3 resolved: Frame allocation is hybrid (stack for normal frames, heap for PERF-captured continuations). Was previously specified as all-heap.
- Effect-shadowing semantics pinned: Nested handlers shadow by install order, not by cache location. Has implications for both compiler lowering and the ISA’s handler-stack semantics.
- Pointer arithmetic via typed instructions: Specifies that the language relies on tag-preserving pointer operations (
INCstyle) rather than untagged raw arithmetic. The ISA team is asked to provide these. - Bit operations required from ISA:
AND,OR,XOR,SHL,SHRneeded for variant extraction and other bitfield work. Was previously OPEN 4. - Continuation in-memory layout pinned: On-heap shape specified, no longer implementation-defined.
- Handler-tower locality is a first-class concern: The CAM-vs-spill cost (12×) is significant enough that the language and compiler should help programmers manage it.
1. Design principles (unchanged from v0.1)
One: Effects are the only mechanism for I/O, allocation, scheduling, and communication. There are no built-in side effects in the language proper. Every operation that touches the world goes through perform.
Two: The type system tracks effects via row polymorphism. Every function’s type includes its effect row. Effects propagate by inference; annotations are optional but checked.
Three: unsafe is itself an effect. Operations that emit raw ISA instructions are gated by unsafe { ... } blocks, which add Unsafe to the surrounding function’s effect row.
Four: Memory allocation is an effect. There is no built-in allocator. The bottom handler decides the allocation strategy.
Five: The language and the ISA are co-designed. Language constructs lower directly to ISA primitives without runtime support beyond the kernel and its handlers.
2. Syntax overview
(Unchanged from v0.1. Reproduced here for self-containment of this document. Skip to §3 if familiar.)
2.1 Types
type list a =
| Nil
| Cons of a * list a
type option a =
| None
| Some of a
type pair a b = { fst : a; snd : b }
Algebraic data types with constructors (sum) and records (product). Type parameters are lowercase identifiers.
2.2 Functions
let id (x : a) : a = x
let rec sum_list (xs : list int) : int =
match xs with
| Nil -> 0
| Cons (h, t) -> h + sum_list t
Functions are curried. let rec for recursion. Pattern matching on sums via match ... with.
2.3 Effects
effect State a {
get : () -> a
put : a -> ()
}
effect IO {
print : string -> ()
read_byte : () -> byte
}
Effect declarations introduce a family with named operations. Each operation has a signature; the return type is what the perform-site receives back when the handler resumes.
Effects can be parameterized over types.
:: and Cons
h :: t desugars to Cons (h, t) uniformly. It is a 2-payload
constructor application; the typer enforces the list-shape because
that’s how Cons is declared, but codegen treats it as a normal
constructor call.
2.4 Lists
[] ⇒ Nil
[a; b; c] ⇒ Cons (a, Cons (b, Cons (c, Nil)))
These are sugars. Requires list type with Nil and Cons constructors in scope.
2.5 Records
Full record support: declarations, literals, field access, field
assignment, shorthand { f } = { f = f }, patterns, mutable
modifier. The sched_state type in the book uses all of these.
Book impact: Appendix B mentions records but doesn’t fully specify patterns, shorthand syntax, or assignment. Substantial expansion needed.
Implementation gap: mutable is parsed but not enforced — the
compiler currently allows assignment to non-mutable fields. Spec
should note this as an implementation deficiency.
2.6 Anonymous handler sugar (let | and)
and proc_handlers p = { -- a handler, parameterised by p
| ProcLifecycle.yield (), k -> ...
| ...
}
A let/and binding whose RHS begins with { | clauses } is a
parameterised handler declaration in disguise. Params become handler
state parameters. Allows handlers to coexist in let rec … and …
groups with ordinary functions.
2.7 Perform and handle
let inc () : int / {State int} =
let n = perform State.get () in
perform State.put (n + 1);
n
let main () =
with ref_state 0 in {
let n = inc () in
print_int n
}
perform Effect.op (args) performs an operation. The type carries an effect row.
with handler args in { body } installs handler for the duration of body. The handler discharges the corresponding effect from the body’s effect row.
2.8 Handler declarations
Stateless handler:
handler print_to_uart = {
| IO.print s, k -> uart_send s; resume k ()
| IO.read_byte (), k -> resume k (uart_recv ())
| return v -> v
}
Parameterized handler (state-carrying):
handler<a> ref_state(init : a) = {
| State.get(), k -> resume k init with state := init
| State.put(v), k -> resume k () with state := v
| return x -> x
}
(Syntax for parameterized state remains an open compiler-team question; see §10.)
2.9 Unsafe blocks
let alloc_int_cell () : ptr / {Unsafe} =
unsafe {
let cell = __alloc Record 1 in
__st 0 cell 0;
cell
}
unsafe { ... } adds Unsafe to the surrounding function’s effect row. Inside, primitives prefixed with __ are available; see §6.
3. Type system
3.1 Types
T ::= int | byte | bool | unit | string | empty
| ptr | cont -- low-level types, primarily from unsafe blocks
| 'a -- type variable
| T -> T / R -- function type with effect row
| (T, T, ...) -- tuple
| { l : T; ... } -- record
| type_name T1 T2 ... -- type constructor application
int is 16-bit signed. byte is 8-bit unsigned. empty is the empty type, used for never-returning functions (exit : () -> empty / {Scheduler}).
3.2 Type aliasa
type core_id = int
Transparent. Type-parameterised aliases (type box 'a = 'a ref) are
specified but not implemented yet.
3.3 Effect rows
R ::= {} -- pure
| { E1, E2, ... } -- closed row
| { E1, E2, ... | r } -- open row with variable
Row polymorphism is supported. A function’s effect row is inferred; annotations are optional and checked.
Decision (v0.2): The language uses effect rows à la Koka, not effect handles à la Frank/OCaml 5. This was a major implementation choice flagged in the compiler team’s notes; ratifying it explicitly. Effect rows give the kernel the ability to statically prove “this function does not allocate,” which is load-bearing for the design.
3.4 Inference
Hindley-Milner with effect rows. Implementation follows the algorithm in [Leijen 2017]. The compiler team’s working notes mention an option to start with inference-only (no surface effect syntax) and add surface annotations later. Acceptable as an implementation phase; the semantics is row-based from day one.
3.4 The Unsafe effect
Unsafe is built-in. Any use of an __-prefixed primitive adds Unsafe to the surrounding function’s row. The compiler permits unsafe { ... } blocks anywhere; the type system tracks them via the row.
4. Memory model
4.1 No built-in allocation
Every allocation is perform Mem.alloc { ... }. The Mem effect is in the prelude:
effect Mem {
alloc : { layout : layout; size : int } -> ptr
free : ptr -> ()
}
The bottom handler implements Mem. The kernel installs different bottom handlers for different memory strategies.
4.2 Construction syntax
let xs = Cons (1, Nil) desugars to a sequence of perform Mem.alloc calls and field writes, wrapped in an implicit unsafe. The user does not see the desugaring.
Optimization permitted: When the compiler can prove that the active Mem handler is a known bump-allocator (e.g., inside a with region r in { ... } block whose handler is statically resolvable), it may compile allocation directly to ALLOC instructions, bypassing the perform-and-handler dispatch. This is a semantics-preserving optimization, not a language change.
4.3 Region handlers
The canonical user-level memory strategy is region-based:
with region r in {
let xs = Cons (1, Nil) in
let n = sum_list xs in
n
}
The region handler intercepts Mem.alloc and routes allocations into a bump arena. When the with block exits, the arena is reset.
OPEN (carried from v0.1): Region escape detection. Forbidding ptr return from with region blocks is the v0.1 answer and remains the v0.2 answer. A full region-polymorphism system is deferred.
4.4 GC
Not in v0.2. A future bottom handler can implement tracing GC; the language doesn’t change.
5. Compilation overview
Note on modules:
Each .cml file is a module named after its basename, capitalised.
Cross-module references use Module.name. Effects and types share a
global namespace; functions are module-local.
5.1 Frame allocation: hybrid stack/heap (NEW in v0.2)
Resolved from v0.1 OPEN 3.
Function-call frames live on the data stack used by CALL/RET. Continuation frames captured by PERF are migrated to the heap as CONT_REC objects at capture time.
This is the production design from the compiler team’s implementation work and proves to be tractable. The simplification “all frames on the heap” from v0.1 is withdrawn.
Implications:
- Normal function calls do not allocate.
CALLpushes return PC + saved closure to the stack;RETpops them. NoMem.allocperform involved. PERFis the allocation point. WhenPERFcaptures a continuation, the live register state and the relevant stack slice are migrated to a freshCONT_RECon the heap. The on-heap layout is specified in §5.6.- Tail calls (§5.8) reuse the current frame; no stack growth.
5.2 Lowering of expressions
| Source | Target |
|---|---|
42 (int literal) | LI vd, 42 (INT tag default) |
x (variable) | depends on x’s storage class |
f x (application) | place x in v0, f in v6, emit CALL v6, v0 |
let x = e1 in e2 | compile e1, store result in x’s slot, compile e2 |
match v with ... | compile v to a register, emit MATCH with dispatch table |
perform E.op args | place args in v0, emit PERF (family, op), v0 |
resume k v | place v in v0, k in v1, emit RESU v1, v0 |
unsafe { ... __op ... } | direct emission of corresponding ISA instruction |
let sched : sched_state = init_sched_state (this_core ())
This is the binding that tripped you up earlier. Allocates a memory
cell at fixed address 0x0E00+, initialised once in main’s prologue.
References LD from the slot.
5.3 Register classes (NEW in v0.2, ratified from compiler team)
| Register | Class | Notes |
|---|---|---|
v0 | argument / return / PERF argument | clobbered by every CALL and every PERF |
v1 | scratch / PERF continuation | clobbered by every PERF (receives continuation) |
v2..v5 | caller-saved scratch | callee may freely clobber |
v6 | $clo (closure environment) | preserved across CALL via implicit save/restore |
v7 | PERF op-tag | clobbered by every PERF (receives op-tag) |
No callee-saved registers. Function prologues are empty; the cost of preserving values moves to callers and is paid only where needed. For an effect-heavy language, this is the right tradeoff — call frequency exceeds the relative gain from callee-saved registers.
v0, v1, and v7 are clobbered by PERF. The compiler must treat them as not-live across any perform site. This constrains register allocation in effect-heavy code; programs should be written assuming three of eight registers are always at risk near perform expressions.
5.4 Function lowering
A function let f x = body compiles to:
-
A heap-allocated closure record with
layout = CLOSURE_REC, containing:- Word 0: header
- Word 1: code pointer
- Words 2..N+1: captured free variables
-
A code body at the address in word 1. The body’s prologue:
- Receives argument in
v0, closure pointer inv6 - Loads needed captured free variables from
v6 + offset - Executes the body
- Returns via
RET v0
- Receives argument in
5.5 Sum type lowering
A sum type type t = | A of int | B of (int, int) lowers per v0.1 §5.4, with the addition that variant extraction from headers requires bit operations (§9).
Pattern match on v1 : t:
LD v3, v1, -1 ; load header (header sits before payload)
LI v_shift, 8
SHR v3, v3, v_shift ; shift variant byte down to low bits
LI v_mask, 0xFF
AND v3, v3, v_mask ; mask off layout/size fields
MATCH v3, v3, table ; dispatch on variant
The compiler emits this sequence at every match on a PTR-tagged value.
Optimization hook: When the compiler can prove a match has only one or two variants, it can replace MATCH with a single BZ against the variant. Worth doing; common in practice for option types.
5.6 Continuation in-memory layout (NEW in v0.2)
A CONT_REC on the heap has the following shape:
Offset Field Notes
------ --------------------- -----
-1 header layout = CONT_REC, size = 11, variant = used/unused flag
0 saved PC 16-bit code address
1 saved SP stack pointer at capture time
2 prompt SP stack pointer at handler frame (delimits the captured slice)
3 saved v6 value
4 saved v6 tag (low 3 bits in a 16-bit word; remaining bits reserved)
5 saved v0 packed high byte = tag, low byte = value (or two words; see note)
6 saved v1 packed
7 saved v2 packed
8 saved v3 packed
9 saved v4 packed
10 saved v5 packed
11 saved v7 packed
Note on packing: Current 16-bit values + 3-bit tags fit awkwardly. The implementation may use either (a) one word per register with tag in the high 3 bits and value in the low 13 bits, sacrificing 3 bits of value range in saves, or (b) two words per register (tag word + value word), doubling the CONT_REC size. Option (b) is more future-proof for a 32-bit expansion. Decision: option (b) for v0.2. The continuation record is larger but the layout extends cleanly to wider data words. Total size grows to approximately 19 words; still well within the 15-word inline-header limit if we use the “size = 15, consult next word” escape.
The continuation handle returned by PERF is a CONT-tagged pointer to the header of this record.
RESU vk, vv reads the CONT_REC pointed to by vk, restores the register state and stack slice, places vv in v0 at the resume site, and transfers control to the saved PC. The variant flag in the header is updated to mark the continuation consumed; a second RESU traps.
This layout is specified, not implementation-defined. Future implementations (silicon, alternative emulators, debuggers) must agree. The Ruby emulator’s current “side-table index” approach is a permitted implementation strategy only if it maintains semantic equivalence with the spec’d layout — i.e., a debugger or migration tool would see the same logical contents.
5.7 Effect lowering and handler installation
(Mostly unchanged from v0.1.)
The compiler emits effect declarations into a manifest. Linker resolves to (family, op) pairs.
Reserved families:
- Family 0: Privileged control plane.
- Family 0xF: Hardware-originated effects.
- User available: 1 through 0xE.
Handler installation lowers to:
INST family, op, +1
JMP past_handler
handler_body:
...
RESU v1, vresult
past_handler:
...
Handler-tower locality (NEW in v0.2):
The handler CAM holds 8 entries. Beyond that, handlers spill to memory, with a 12× cost penalty on PERF searches that miss the CAM. The compiler should:
- Warn when a
withblock’s compilation would cause the static handler tower depth to exceed 8. - Avoid emitting handler installations in inner loops where the same handler could be hoisted outside.
- Document handler depth as a programmer-visible cost.
The language does not statically prevent deep handler towers — they remain expressible — but the compiler issues guidance, and runtime profiling can flag spill-thrashing.
Handler shadowing semantics (NEW in v0.2):
Multiple installations of the same (family, op) must resolve to the most-recently-installed match. This is the standard algebraic-effects semantics; it is non-negotiable.
This places a constraint on the ISA: the handler-stack lookup mechanism must respect install order, not merely cache recency. The compiler team’s v0.3 notes propose install-ID disambiguation; that’s the ISA’s problem to solve. The language requires the semantics; the ISA must deliver it.
5.8 Tail calls
Tail calls must be optimized. The compiler identifies tail-position calls and emits JMP (after argument setup) rather than CALL+RET. Tail calls do not grow the stack.
This is a hard requirement. The kernel scheduler is written assuming TCO; non-TCO compilers cannot compile it.
6. Pointer arithmetic (NEW in v0.2)
The language relies on typed pointer arithmetic for kernel-level code: bumping heap pointers, walking array elements, etc.
The ISA is asked to provide tag-preserving operations:
INC vd, vs, imm4 ; vd ← (PTR, vs.value + imm4); requires vs.tag = PTR
DEC vd, vs, imm4 ; vd ← (PTR, vs.value - imm4); requires vs.tag = PTR
Inside the language, these surface as primitive operations in unsafe blocks:
unsafe {
let hp' = __inc hp 3 in (* bump heap pointer by 3 words *)
...
}
The language does NOT expose untagged arithmetic. No __add_raw or ADD.RAW. Type-checked arithmetic over INT operands is the only ALU operation form. Pointer arithmetic uses typed INC/DEC.
Rationale: untagged arithmetic would punch a hole in the system’s type-tagging invariant — a value of unknown tag could enter the ALU, and downstream type-checked operations could not safely consume the result. Typed pointer ops preserve the invariant while solving the actual need (the kernel almost never needs general pointer arithmetic; it needs +N and -N for small N).
If the ISA team objects, the fallback is ADD.RAW/SUB.RAW inside unsafe blocks only, with the type system tracking Unsafe aggressively. But INC/DEC are preferred.
7. Bit operations (NEW in v0.2)
The language relies on the ISA providing:
AND vd, vs, vt ; bitwise AND; both INT
OR vd, vs, vt ; bitwise OR; both INT
XOR vd, vs, vt ; bitwise XOR; both INT
SHL vd, vs, vt ; vs << vt.value; both INT
SHR vd, vs, vt ; vs >> vt.value, logical (zero-fill); both INT
These are required for variant extraction (§5.5), bitfield manipulation in headers, and bitmap operations in the kernel.
These are non-negotiable; the language cannot compile without them. The ISA team has been notified.
Arithmetic shift right (SAR) is not yet required and is deferred.
8. Standard prelude
effect Mem {
alloc : { layout : layout; size : int } -> ptr
free : ptr -> ()
}
effect IO {
print : string -> ()
read_byte : () -> byte
}
effect Scheduler {
yield : () -> ()
spawn : (() -> ()) -> ()
exit : () -> empty (* never returns; uses empty type *)
}
effect Unsafe { /* raw ISA primitives */ }
The kernel installs bottom handlers for Mem, IO, Scheduler before user code runs.
9. Compiler-emitted optimizations
The compiler is permitted (and encouraged) to:
- Inline handler dispatch when the handler is statically known. A
perform Mem.allocinside awith region r in { ... }block can compile directly toALLOCwithout going throughPERF/handler-search. - Elide
SETTAG vd, INTafterLD(since LD already produces INT). - Replace
MATCHwithBZ-chains for sum types of one or two variants. - Inline small functions to avoid the
CALLoverhead. - Optimize tail calls (required, not optional).
The compiler is forbidden from:
- Reordering perform sites within a function. The order of
performcalls is observable. - Eliminating perform sites, even if the handler is known to be effect-free. Future handlers may not be.
- Caching the result of a perform across calls.
10. Open questions, consolidated
Carried from v0.1 or surfaced anew:
- OPEN (compiler team to resolve): Parameterized handler state syntax. (
with state := vvs.resume k v new_statevs. other.) - OPEN (v0.1 carryover): Region escape detection. Forbid
ptrreturn from regions for v0.2; consider region-polymorphism for v0.3. - OPEN (ISA team to resolve): Multiple-install disambiguation. Language requires install-order semantics; ISA must deliver. Compiler team’s v0.3 notes propose install-ID disambiguation, which is acceptable.
- OPEN (deferred): Fused tag-aware load variants (
LD.PTR,LD.CONT,LD.CLOSURE). Profile first, decide later. - OPEN (deferred): Multi-shot continuations. v0.2 specifies one-shot.
11. Versioning
This document is Language Spec v0.2.
Major semantic changes will produce v0.3. Clarifications without semantic change will produce v0.2.1.