Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Appendix C: NoC and Device Protocols

This appendix documents the protocols that live below the ISA but above the silicon: how NoC packets are structured, how devices and cores communicate, and how the effect-table is populated to route effects to the right destination.

C.1 NoC packet format

The NoC is a 4-local mesh (each tile has four neighbors: N, E, S, W) carrying fixed-size packets. The current packet format is 64 bits, but the wire format may pad or quantize as needed.

 0      6      12     16     20     52     64
 |------|------|------|------|------|------|
 | dst  | src  | op   | flag | pl   | rsvd |
 |  6b  |  6b  |  4b  |  4b  | 32b  | 12b  |
FieldWidthMeaning
dst6 bitsDestination tile ID (64 tiles max)
src6 bitsSource tile ID
op4 bitsOperation code (per-destination semantics)
flag4 bitsReserved flags (priority, ack, etc.)
pl32 bitsPayload — typed value, with implicit tag
rsvd12 bitsReserved for future use

Total: 64 bits per packet.

The payload is 32 bits, accommodating an INT (with explicit tag), a PTR (memory address with tag), a CONT (continuation handle with tag), or a CLOSURE (closure handle with tag). The 32-bit width covers more than the 16-bit native data word for forward compatibility with a 32-bit successor ISA.

C.2 Tile ID assignments

Tile ID rangeUse
0x00 – 0x07Cores (0–7)
0x10 – 0x17UART devices (multiple)
0x18 – 0x1FKeyboard interface devices
0x20 – 0x27Timer devices
0x28 – 0x2FDisk controllers
0x30 – 0x37Network interfaces
0x38 – 0x3FReserved devices
0x40 – 0xFFReserved future use

The tile ID space is sparse; many of the device slots are unused in the toy. The space leaves room for adding devices without renumbering existing ones.

C.3 Per-device op codes

Each device interprets the op field according to its protocol. Op codes are device-private; the same op code means different things to different devices.

UART (0x10)

OpDirectionPayloadMeaning
0x01core→UARTbyte (low 8 bits)Transmit byte
0x02UART→corebyte (low 8 bits)Received byte
0x03core→UARTconfig wordConfigure baud rate, parity, stop bits
0x04UART→corestatus wordStatus report (overrun, framing error, etc.)
0x0Fbidirectional(none)Identify (returns device type)

Keyboard Interface Device (KID, 0x18)

OpDirectionPayloadMeaning
0x01KID→corescancodeKey event (high bit = release)
0x02core→KIDcore_idBind: future key events to this core
0x03core→KID(none)Request current modifier state
0x0Fbidirectional(none)Identify

Timer (0x20)

OpDirectionPayloadMeaning
0x01core→Timerreload countSet countdown reload value
0x02core→Timermask flagEnable/disable interrupt
0x03Timer→corecore_idTick fired
0x04core→Timercore_idBind: future ticks to this core
0x0Fbidirectional(none)Identify

Inter-core (peer cores, 0x00-0x07)

OpDirectionPayloadMeaning
0x01core→corecontinuation handleMigrate process to target core
0x02core→corerequest_idSteal request
0x03core→corecontinuation handle or nullSteal reply
0x04core→coremessage payloadGeneric IPC message
0x05core→corebarrier_idBarrier signal
0x0Fcore→core(none)Identify (core_id, status)

C.4 NoC ordering guarantees

Endpoint-to-endpoint ordering is guaranteed. Packets sent from tile A to tile B arrive in the order A sent them. This is implemented by the deterministic dimension-ordered routing (X first, then Y) used by the mesh router.

Cross-endpoint ordering is not guaranteed. Packets from A to C and from B to C may arrive at C in either relative order. This is fine for almost all uses; the few cases where it matters (e.g., distributed consensus) need higher-level sequencing built on top.

No silent drops in normal operation. The NoC fabric does not drop packets due to congestion; it applies backpressure. A sending tile that can’t make progress will stall its NoC output, which propagates back through the sequencer pipeline as the PERF instruction blocking.

The exception: if a destination tile’s input FIFO is full and remains full for more than a configurable threshold of cycles, the packet may be dropped and a NocOverflow effect raised on the sender. This protects against pathological cases (a hung receiver) at the cost of relaxing the no-drop guarantee. Default threshold: 1024 cycles.

C.5 Effect table population

The effect table (256 entries indexed by (family, op)) maps an effect ID to a destination. Each entry is one of:

  • Unhandled — trap if performed
  • Local(handler_pc) — fall back to this PC if not in CAM (normal local handler)
  • Remote(dst_tile, dev_op) — emit a NoC packet to dst_tile with op dev_op

The effect table is in memory, at a kernel-controlled address. Each entry is 32 bits:

 31       28 27    26 25  20 19         0
 |---------|------|------|------------|
 |   tag   | rsvd | op2  |    addr    |
TagVariantaddr interpretation
0x0Unhandled(ignored)
0x1Localhandler_pc
0x2Remotedst_tile in low 8 bits, dev_op in op2

The kernel populates the effect table at boot, after enumerating devices. For example, to bind the UART transmit effect:

unsafe {
  __effect_table_set
    (family=2, op=0)     (* UartTx *)
    (Remote { dst_tile = 0x10; dev_op = 0x01 })
}

C.6 Device enumeration protocol

At boot, the bootstrap core enumerates devices by sending an Identify packet (op 0x0F) to each candidate tile. Devices respond with a packet identifying their type and capabilities.

The Identify reply payload format:

 31           24 23           16 15            0
 |------------|------------|----------------|
 | device_kind| version   |    capabilities|
 |    8b      |    8b      |       16b      |
device_kindType
0x01UART
0x02KID
0x03Timer
0x04Disk
0x05NIC
0x06DMA controller
0xFFReserved/unknown

version is a per-device-type protocol version. capabilities is a per-device-type bitfield documenting optional features.

A tile that does not respond to Identify within a timeout is presumed absent. The bootstrap core builds a device map from the responses.

C.7 Effect family conventions

The 16 effect families are allocated as follows. Each family has 16 ops, for 256 total effect IDs.

FamilyUse
0x0Reserved for control plane (scheduler, lifecycle)
0x1Memory (allocation, region operations)
0x2UART and console I/O
0x3Keyboard input
0x4Timer and clock
0x5Disk
0x6Network
0x7Inter-process communication
0x8User-defined (general)
0x9 – 0xEReserved for future use
0xFHardware-originated effects (NoC arrival, etc.)

User code can perform effects in family 0x8. Library code typically defines its own effects in 0x8 (allocating ops by convention or by registration). Effects in 0x9–0xE may be used by future ISA revisions.

Family 0xF is the only family whose effects are originated by hardware. When the NoC interface receives a packet and surfaces it as a perform, the perform’s family is 0xF. This convention lets handlers distinguish hardware-driven effects from software-driven ones.

C.8 Boot ROM responsibilities

The boot ROM is the first code that runs on each core after reset. It performs the following sequence:

  1. Initialize core state. Set up the data stack pointer, the heap pointer, the handler CAM (empty), the effect table base register.

  2. Install bottom handlers. The boot ROM installs no-op or panic handlers for the catch-all of each family. The kernel will install real handlers later, but in the gap between boot ROM and kernel start, errant performs should not deadlock.

  3. For the bootstrap core (core 0): enumerate devices (send Identify to each candidate tile), build the device map, populate the effect table with default bindings for known devices.

  4. For non-bootstrap cores: wait for a release signal from core 0 (a NoC packet with op 0x06).

  5. Spawn the kernel main. Reify the kernel’s entry point as a process and invoke it. The kernel takes over from there.

The boot ROM is approximately 200 instructions in the current design. It is the only code that uses raw PERF instructions to specific device addresses; everything afterward goes through kernel-provided abstractions.

C.9 Open issues

A few protocol-level decisions are deferred:

  • Boot ROM placement. The boot ROM is in a memory region the kernel cannot overwrite. The exact address layout is implementation-defined.

  • Multicast NoC packets. Currently each NoC packet has a single destination. Some use cases (cache invalidation, barriers) would benefit from multicast. The packet format reserves bits for this but the implementation is deferred.

  • Priority and QoS. The flag field reserves bits for priority but the implementation uses none currently. A real version would prioritize control-plane packets over data-plane packets.

  • Encryption and authentication. None at present. A real system intending to run multiple mutually-distrusting tenants would need cryptographic guarantees on inter-tile traffic.

These are all surmountable; they are simply not in the current scope.