Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

What this design is for, and what it isn’t

The book has been advocating. The closing chapter should situate. The design described in this book is not a panacea, not a production target, not a competitor to anything that ships in real silicon today. It is a specific point in the design space, useful for specific things, with specific costs. Let me be honest about both.

What this design is for

Teaching. The book itself is the artifact. Reading it should give you new conceptual handles for thinking about kernels, type systems, and concurrency, regardless of whether you ever write a line of CML or run a CML kernel. The unification of preemption, blocking, and IPC under a single effect mechanism is the kind of idea that, once seen, makes you ask whether the systems you’re using should have been built that way. The answer might be no — there are real reasons not to. But asking the question productively requires having the example.

Edge servers and L7 control-plane work. The natural workload for this design is many concurrent flows, each doing small amounts of work, blocked frequently on I/O. Web servers. Reverse proxies. API gateways. Service meshes. Anywhere a conventional design would use thousands of OS threads or an event loop with handler callbacks, this design substitutes a uniform effect mechanism. The cost per blocked flow is the size of its actual live state, which is small. The cost per active flow is comparable to threaded code, possibly better because the context-switch cost is the cost of an effect dispatch (a handful of cycles in this design) rather than the cost of an OS context switch (microseconds in modern Linux).

The argument here is the same one BEAM (Erlang’s VM) made forty years ago and that Go made fifteen years ago: lightweight units of execution with cheap context switching enable concurrency patterns that heavyweight thread-based systems can’t reach. This book’s design extends that argument to hardware co-design: rather than building a virtual machine on top of an OS on top of an ISA, build the lightweight-process model directly into the ISA, and you can simplify dramatically.

Protocol implementations where correctness is the dominant cost. The TCP receive path in chapter 15 illustrates the model. A protocol stack written as a tower of handlers is structurally easier to reason about than a protocol stack written as a thread of control with state machines. The state machines still exist — they’re inside the handlers — but the layering is enforced by the type system rather than by convention. Bugs that come from “we called the wrong layer’s function” become compile errors. Bugs that come from “we forgot to handle this state transition” become exhaustiveness warnings.

For protocols where the cost of getting it wrong is catastrophic — financial, medical, infrastructure — this structural cleanness is worth real money. The same TCP stack will be a few times slower than a tuned production stack on conventional hardware, and several times faster to verify.

Research vehicles for effect systems. The hardware support for effects in this ISA — the handler CAM, the continuation-as-value, the NoC-mediated effect dispatch — is doing something no commercial ISA does. If you want to investigate “what would the runtime look like if effect dispatch were as cheap as a function call” or “how does GC change when continuations are first-class hardware values,” this design is a vehicle for those investigations. It is not itself the answer; it is a place to start asking.

Anywhere small, sovereign software is the value. This is more aspirational than commercial. The design philosophy of this book — ontologically minimal, parts that fit each other, software you can hold the whole of in your head — argues for a class of systems that are small enough to verify, modify, and own. The 150-line kernel of chapter 14 is the kind of artifact you can read end-to-end and understand. A real Linux kernel on real x86 is not, regardless of how much you’d like it to be. If you have a problem where the team that runs the software needs to also understand the software — a boutique financial firm with a custom trading system, a hospital with a custom EHR, a small-state government with a custom voter-roll system — designs in this style are worth investigating, not because they’ll outperform Linux on benchmarks but because they have a different epistemic relationship with the people running them.

This last category is the most philosophical and the hardest to defend in a tweet. It’s also the one this author believes is the most important. Most professional software is built to a scale that exceeds any single mind’s capacity to hold; this is treated as a virtue (look how big we are) when it could be treated as a cost (look how few of us understand the whole). Designs that prioritize understandability, even at the expense of scale, are a small but persistent minority position. This book is a contribution to that minority.

What this design is not for

Hard real-time systems. Garbage collection, if any, introduces latency. Effect dispatch has variance (CAM hit vs. CAM miss differs by an order of magnitude). The handler search is bounded but not predictable. For systems where missing a 100µs deadline is unacceptable — flight control, anti-lock brakes, certain trading systems — this design is not appropriate without considerable hardening.

It is possible to make it suitable: bounded handler depths, no-allocation hot paths, statically-allocated continuations. But this is design-by-restriction, and the restrictions remove much of what makes the design pleasant. If hard real-time is your constraint, look at seL4 or Ada/SPARK or specialized RTOSes designed for the purpose.

Maximum-throughput data plane. A CPU running tuned C code with SIMD intrinsics, processing packets in a polling loop with no operating-system involvement, will move bits faster than this design. DPDK, XDP, and similar systems are tuned to this case, and they win. Our design pays for its abstraction in cycles; not many, but enough that you can measure them and notice.

The design wins on concurrency, not on single-flow throughput. If you have a million slow connections, our design plausibly wins. If you have one connection moving 100 Gbps, you want different hardware and different software.

Anything requiring an existing software ecosystem. There is no CML port of OpenSSL. There is no CML port of glibc. There is no CML port of anything. The language is brand new, the ISA is brand new, the kernel is brand new. You cannot run npm install. You cannot grab a Python interpreter and prototype. If you need to use existing software libraries, this is not the system to build on, and bridging to existing software (via FFI to C) is itself a substantial design problem that this book does not address.

Production today, in any serious sense. The book describes a working artifact — there’s an emulator, a compiler, a language server, and a kernel that runs example programs. But “working artifact” and “production-ready system” are separated by many engineer-years of work: standard libraries, debuggers, profilers, package managers, third-party tools, documentation outside this book, training material, hiring pipelines for people who know the system. Real production software requires all of these. The book is not pretending to deliver them.

This is a research artifact released early. The author’s bet is that the design is interesting enough to influence other people’s work, even if no system that ships in real silicon ever uses exactly this design. The history of computer architecture is full of designs that were never commercially successful but whose ideas propagated — the Lisp Machines didn’t survive but their tagged memory influenced JIT compilation throughout the industry; Smalltalk never displaced production languages but its garbage collector and IDE shaped every modern language; the Connection Machine never made it past Thinking Machines but SIMD lives on in every modern GPU. Influence happens through ideas, not through products.

Where this design could go

Plausible extensions, in rough order of effort and risk:

A 32-bit version. The 16-bit toy serves the book, but real applications need 32-bit (or 64-bit) addresses. A 32-bit ISA would widen the tag to 4 or 5 bits, allowing more primitive types (float, byte, int64), and would extend memory to 4 GB. The language changes are minor; the ISA design has open space for it.

A real silicon prototype. The ISA is intentionally simple enough that a single grad student could build it in an FPGA over a summer. A serious effort would target a small ASIC. This would teach much about whether the design’s performance claims hold up in practice or are paper-thin.

Integration with existing software via FFI. If you can call C from CML, you can use existing libraries. The cost is giving up the type system’s guarantees at the FFI boundary, which is what every other language with FFI also pays. Done well, this is the easiest way to make CML practical.

More cores. The four-core toy demonstrates the multicore mechanism. A real version would have 16, 64, 256 cores. The NoC mesh scales naturally; the bottom handlers for distributed scheduling need refinement; work-stealing patterns need to be informed by real workload data.

A persistent boot system. The boot ROM as described is hand-coded. A real version would have a bootloader, a way to load compiled images from storage, a way to update them. None of this is interesting design work but all of it is necessary.

Verification. Many of the design choices in this book were made with verification in mind — linear types, effect tracking, explicit handler stacks, no implicit shared state. These properties make formal verification of the kernel tractable. A version that proved properties of the scheduler (“no process is forgotten,” “every effect is handled or trapped”) would be a meaningful contribution.

A second language on the same ISA. The ISA is designed for an ML, but its primitives are general enough that you could compile other languages to it. A Lisp would fit naturally — the tagged memory matches Lisp’s history. A small dependently-typed language could exploit the verification angle. A logic programming language could use continuations for choice points. The ISA is, in this sense, more general than the language.

A meditation, in closing

The design described in this book is wrong about specific things. Some of the specific decisions — 3-bit tags, 8-entry CAM, particular instruction encodings — are arbitrary choices that a real design would revisit. Some of the more architectural decisions — verified-tagged ALU, hardware-effect dispatch, NoC-as-primary-fabric — are bets that may not pay off at scale. The book’s confident tone should not be confused with the underlying engineering’s certainty.

But the direction the design points in is, the author believes, correct. The C-shaped hardware we inherited is not a fact of nature; it is an accident of which language got popular first. If we could design hardware and language together, we would not design what we have. What we would design, exactly, is open. The book is one proposal, sketched in enough detail to be argued with.

If you read the book and disagree with specific decisions, write your own. The hardware-language co-design space is vastly under-explored. There is room for a hundred more attempts.

If you read the book and walk away thinking the things conventional systems treat as separate concerns might be the same concern in disguise — preemption and blocking, interrupts and yields, message-passing and shared state — that’s the takeaway the book was written to deliver. The rest is implementation detail.

Coda

This is the end of the body. Two appendices follow, with the canonical specifications of the ISA and the language; two more, with device protocols and a glossary. They are reference material, not narrative; consult them as needed.

The design lives. The compiler is being written. The emulator runs. The book is one snapshot of a project that, by the time you read this, has likely changed in ways the author cannot predict. If you want to know what’s current, look for the project’s home page, or ask one of the people working on it.

Whatever you build next, build it small enough to hold in your head, fit its parts to each other rather than to an ideal abstract function, and let the type system do the bookkeeping you’d otherwise pay for in tests, comments, and reviews. That, in the end, is the whole argument.