xCORE-200: The XMOS XS2 Architecture
# Table of Contents

## 1 Introduction

## 2 Interconnect

## 3 Concurrent Threads

## 4 The xCORE Tile Instruction Set

## 5 Instruction Issue and Execution

## 6 Instruction Set Notation and Definitions

## 7 Data Access

## 8 Expression Evaluation

## 9 Branching, Jumping and Calling

## 10 Resources and the Thread Scheduler

## 11 Concurrency and Thread Synchronisation

## 12 Communication

## 13 Locks

## 14 Timers

## 15 Ports, Input and Output

## 16 Events, Interrupts and Exceptions

## 17 Initialisation and Debugging

## 18 Specialised Instructions

## 19 XCore XS2 Instructions

## 20 XS2 Instruction Format Specification

## 21 XS2 Exceptions

## 22 XS2 Lanes
1 Introduction

xCORE-200 products combine a number of xCORE Tile processors, each with its own memory, on a single chip. The programmable processors are general purpose in the sense that they can execute languages such as C; they also have direct support for concurrent processing (multi-threading), communication and input-output. A high-performance switch supports communication between the processors, and inter-chip xConnect Links are provided so that systems can easily be constructed from multiple chips.

xCORE-200 products are intended to make it practical to use software to perform many functions which would normally be done by hardware; an important example is interfacing and input-output controllers.

xCORE-200 products are based on the XS2 architecture. The XS2 architecture is an evolution of the XS1 architecture. The main differences with the XS1 architecture are:

- Dual issue (Section 5.2).
- 64-bit load and store (Section 7.3).
- High priority threads (Section 5.3).

There are also extra instructions for bit manipulation, DSP, and real time management.

2 Interconnect

The interconnect provides communication between all xCORE Tiles on the chip (or system if there is more than one chip). In conjunction with simple programs, it can also be used to support access to the memory on any xCORE Tile from any other xCORE Tile, and to allow any xCORE Tile to initiate programs on any other xCORE Tile.

The interface between an xCORE Tile and the interconnect is a group of xConnect Links which carry control tokens and data tokens. The data tokens are simply bytes of data; the control tokens are as follows.

- Tokens 0-127 (Application tokens). These are intended for use by compilers or applications software to implement streamed, packetised and synchronised communications, to encode data-structures and to provide run-time type-checking of channel communications.
- Tokens 128-191 (Special tokens) are architecturally defined and may be interpreted by hardware or software. They are used to give standard encodings of common data types and structures.
- Tokens 192-223 (Privileged tokens) are architecturally defined and may be interpreted by hardware or privileged software. They are used to perform system functions including hardware resource sharing, control, monitoring and
debugging. An attempt to transfer one of these tokens to or from unprivileged software will cause an exception.

Tokens 224-255 (Hardware tokens) are only used by hardware; they control the physical operation of the link. An attempt to transfer one of these tokens using an output instruction will cause an exception.

Four links connect each xCORE Tile directly to an on-chip switch which provides non-blocking communication between the xCORE Tiles. The switch also provides off-chip xConnect Links allowing multiple XS2 or XS1 chips to be combined in a system. The structure and performance of the xConnect Link connections in a system can be varied to meet the needs of applications.

The links between xCORE Tiles and switches and the xConnect Links can be partitioned into independent networks. This can be used, for example, to provide independent networks carrying long and short messages or to provide independent networks for control and data messages.

Messages are routed to channel-ends on a specific processor through the xConnect Links using a message header which contains the number of the destination chip, the number of the destination processor and the number of a destination channel-end within the processor. These can be encoded using either 24 bits (16 bits chip and processor address, 8 bits channel address) or 8 bits (3 bits chip and processor address, 5 bits channel address).

Each switch has a configurable identifier and can also be configured to route messages according to the first component of each message header. It compares this bit-by-bit with its own switch identifier; if all bits match it then uses the second component to route the message to the destination xCORE Tile. If the bits do not match, then it uses the number of the first non-matching bit to select an outgoing direction. The direction of each xConnect Link is set when the switch is configured and it is possible for several xConnect Links to share the same direction thereby providing several independent routes between two switches.

The header establishes a route through the interconnect and subsequent tokens will follow the same route until one of two special control tokens is sent: these are end-of-message (END) and pause (PAUSE).

2.1 xConnect Link Ports

The ports used for inter-chip xConnect Link communication use a transition-based non return-to-zero signalling scheme. Bits are sent at a rate derived from the XS2 clock; this rate can be programmed to meet applications requirements.

The xConnect Links can be switched between a fast, wide mode and a slower, serial mode. Two encoding schemes are used.

2.2 Serial xConnect Link

The serial xConnect Link uses two data wires in each direction. A transition on Wire 1 represents a one bit and a transition on Wire 0 represents a zero bit. The first bit of a control token is a one; the first bit of a data token is a zero; the next 8 bits
are the token value. The two signal wires are both at rest between tokens and the final bit of each token is chosen to return the non-zero signal wire to the rest state; one of the signal wires must be non-zero at this point as nine bits have been sent.

On the serial link, the END and PAUSE tokens are coded directly as application tokens 1 and 2.

The link also uses several hardware tokens. The credit tokens are transmitted by the receiver to control the flow of data; each CREDIT\(n\) token issues credit to the sender to allow it to send \(n\) tokens. The HELLO token solicits initial credits, setting up a half-duplex link. To bring up a link, both sides have to issue a HELLO, and both sides have to respond to the HELLO with a CREDIT\(n\) token.

<table>
<thead>
<tr>
<th>token</th>
<th>use</th>
</tr>
</thead>
<tbody>
<tr>
<td>224</td>
<td>CREDIT8</td>
</tr>
<tr>
<td>225</td>
<td>CREDIT64</td>
</tr>
<tr>
<td>228</td>
<td>CREDIT16</td>
</tr>
<tr>
<td>230</td>
<td>HELLO</td>
</tr>
</tbody>
</table>

### 2.3 Fast xConnect Link

The fast xConnect Link uses 1-of-5 codes with five data wires in each direction; a symbol is transmitted by changing the state of one of the wires. Each symbol has the following meaning:

<table>
<thead>
<tr>
<th>symbol</th>
<th>meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>Wire 0 changes</td>
<td>value 00</td>
</tr>
<tr>
<td>Wire 1 changes</td>
<td>value 01</td>
</tr>
<tr>
<td>Wire 2 changes</td>
<td>value 10</td>
</tr>
<tr>
<td>Wire 3 changes</td>
<td>value 11</td>
</tr>
<tr>
<td>Wire 4 changes</td>
<td>escape</td>
</tr>
</tbody>
</table>

A sequence of four symbols are used to encode each token. In the following \(e\) is an escape and \(v\) is one of the values 00, 01, 10, 11.

<table>
<thead>
<tr>
<th>symbol sequence</th>
<th>use</th>
</tr>
</thead>
<tbody>
<tr>
<td>(v) (v) (v) (v)</td>
<td>256 data tokens</td>
</tr>
<tr>
<td>(e) (v) (v) (v)</td>
<td>64 control tokens 192-255</td>
</tr>
<tr>
<td>(v) (e) (v) (v)</td>
<td>64 control tokens 128-191</td>
</tr>
<tr>
<td>(v) (v) (e) (v)</td>
<td>64 control tokens 64-127</td>
</tr>
<tr>
<td>(v) (v) (v) (e)</td>
<td>64 control tokens 0-63</td>
</tr>
</tbody>
</table>
There are some additional codes in which more than one symbol is an escape. These are used to code certain control tokens.

<table>
<thead>
<tr>
<th>symbol sequence</th>
<th>use</th>
</tr>
</thead>
<tbody>
<tr>
<td>e e v v</td>
<td>END tokens</td>
</tr>
<tr>
<td>v v e e</td>
<td>PAUSE tokens</td>
</tr>
<tr>
<td>e v v v e</td>
<td>NOP (return to zero) tokens</td>
</tr>
<tr>
<td>e 11 11 v</td>
<td>NOP (return to zero) tokens</td>
</tr>
<tr>
<td>e 00 e 00</td>
<td>CREDIT8</td>
</tr>
<tr>
<td>e 01 e 01</td>
<td>CREDIT64</td>
</tr>
<tr>
<td>e 10 e 10</td>
<td>HELLO</td>
</tr>
<tr>
<td>e 11 e 11</td>
<td>CREDIT16</td>
</tr>
</tbody>
</table>

Because each token contains four symbols, at the end of each token there are always an even number of signal wires in a non-zero state. To send an END or PAUSE, one of the END or PAUSE tokens is chosen to leave at most two signal wires in a non-zero state; this can be followed by a NOP token which is chosen to leave all of the signal wires in a zero state.

The encoding of the credit and reset tokens has been chosen so that the state of the signal wires after the token is the same as it was before the token.

3 Concurrent Threads

A single XCore enables a number of tasks to execute concurrently in threads. Each thread executes a series of instructions that follow a conventional three register operand model. Threads have access to resources that enable a thread to interact with other threads or the outside world.

Each xCORE Tile has hardware support for executing a number of concurrent threads. This includes:

- a set of registers for each thread.
- a thread scheduler which dynamically selects which thread to execute.
- a set of synchronisers to synchronise thread execution.
- a set of channels used for communication with other threads.
- a set of ports used for input and output.
- a set of timers to control real-time execution.
- a set of clock generators to enable synchronisation of the input-output with an external time domain.
- a set of hardware locks to enable low level locking.
Instructions are provided to support initialisation, termination, starting, synchronising and stopping threads; also there are instructions to provide input-output and inter-thread communication.

The set of threads on each xCORE Tile can be used:

- to implement input-output controllers executed concurrently with applications software.
- to allow communications or input-output to progress together with processing.
- to allow latency hiding in the interconnect by allowing some threads to continue whilst others are waiting for communication to or from remote xCORE Tiles.

The instruction set includes instructions that enable the threads to communicate and perform input and output. These:

- provide event-driven communications and input-output with waiting threads automatically descheduled.
- support streamed, packetised or synchronised communication between threads anywhere in a system.
- enable the processor to idle with clocks disabled when all of its threads are waiting so as to save power.
- allow the interconnect to be pipelined and input-output to be buffered.

4 The xCORE Tile Instruction Set

The main features of the instruction set used by the xCORE Tile processors are as follows.

- Short instructions are provided to allow efficient access to the stack and other data regions allocated by compilers; these also provide efficient branching and subroutine calling. The short instructions have been chosen on the basis of extensive evaluation to meet the needs of modern compilers.

- The memory is byte addressed; however all accesses must be aligned on natural boundaries so that, for example, the addresses used in 32-bit loads and stores must have the two least significant bits zero. The memory is little endian.

- The processor supports a number of threads each of which has its own set of registers. Some registers are used for specific purposes such as accessing the stack, the data region or large constants in a constant pool.

- Input and output instructions allow very fast communications between threads within an xCORE Tile and between xCORE Tiles. They also support high speed, low-latency, input and output. They are designed to support high-level concurrent programming techniques.

Most instructions are 16-bit. Many instructions use operands in the range 0...11 as this allows sufficient three-address instructions to be encoded using 16 bit instruc-
tions. Instruction prefixes are used to extend the range of immediate operands and to provide more inter-register operations (and inter-register operations with more operands). The prefixes are:

- PFIX which concatenates its 10-bit immediate with the immediate operand of the next 16-bit instruction.
- EOPR which concatenates its 11-bit operation set with the following instruction.

The prefixes are inserted automatically by compilers and assemblers.

The normal state of a thread is represented by 12 operand registers, 4 access registers and 2 control registers.

The twelve operand registers \( r_0 \ldots r_{11} \) are used by instructions which perform arithmetic and logical operations, access data structures, and call subroutines.

The access registers are:

<table>
<thead>
<tr>
<th>register</th>
<th>number</th>
<th>use</th>
</tr>
</thead>
<tbody>
<tr>
<td>( cp )</td>
<td>12</td>
<td>constant pool pointer</td>
</tr>
<tr>
<td>( dp )</td>
<td>13</td>
<td>data pointer</td>
</tr>
<tr>
<td>( sp )</td>
<td>14</td>
<td>stack pointer</td>
</tr>
<tr>
<td>( lr )</td>
<td>15</td>
<td>link register</td>
</tr>
</tbody>
</table>

The control registers are:

<table>
<thead>
<tr>
<th>register</th>
<th>number</th>
<th>use</th>
</tr>
</thead>
<tbody>
<tr>
<td>( pc )</td>
<td>16</td>
<td>program counter</td>
</tr>
<tr>
<td>( sr )</td>
<td>17</td>
<td>status register</td>
</tr>
</tbody>
</table>

Each thread has seven additional registers which have very specific uses:

<table>
<thead>
<tr>
<th>register</th>
<th>number</th>
<th>use</th>
</tr>
</thead>
<tbody>
<tr>
<td>( spc )</td>
<td>18</td>
<td>saved pc</td>
</tr>
<tr>
<td>( ssr )</td>
<td>19</td>
<td>saved status</td>
</tr>
<tr>
<td>( et )</td>
<td>20</td>
<td>exception type</td>
</tr>
<tr>
<td>( ed )</td>
<td>21</td>
<td>exception data</td>
</tr>
<tr>
<td>( sed )</td>
<td>22</td>
<td>saved exception data</td>
</tr>
<tr>
<td>( kep )</td>
<td>23</td>
<td>kernel entry pointer</td>
</tr>
<tr>
<td>( ksp )</td>
<td>24</td>
<td>kernel stack pointer</td>
</tr>
</tbody>
</table>

The status register \( sr \) contains the following information:
<table>
<thead>
<tr>
<th>bit</th>
<th>number</th>
<th>use</th>
</tr>
</thead>
<tbody>
<tr>
<td>eeble</td>
<td>0</td>
<td>event enable</td>
</tr>
<tr>
<td>ieble</td>
<td>1</td>
<td>interrupt enable</td>
</tr>
<tr>
<td>inenb</td>
<td>2</td>
<td>thread is enabling events</td>
</tr>
<tr>
<td>inint</td>
<td>3</td>
<td>thread is in interrupt mode</td>
</tr>
<tr>
<td>ink</td>
<td>4</td>
<td>thread is in kernel mode</td>
</tr>
<tr>
<td>reserved</td>
<td>5</td>
<td>do not use</td>
</tr>
<tr>
<td>waiting</td>
<td>6</td>
<td>thread waiting to execute current instruction</td>
</tr>
<tr>
<td>fast</td>
<td>7</td>
<td>thread enabled for fast input-output</td>
</tr>
<tr>
<td>di</td>
<td>8</td>
<td>thread is running in dual issue mode</td>
</tr>
<tr>
<td>kedi</td>
<td>9</td>
<td>thread switches to dual issue on kernel entry</td>
</tr>
<tr>
<td>hipri</td>
<td>10</td>
<td>thread is in high priority mode</td>
</tr>
</tbody>
</table>

5 Instruction Issue and Execution

The processor is implemented using a short pipeline to maximise responsiveness. It is optimised to provide deterministic execution of multiple threads. There is no need for forwarding between pipeline stages and no need for speculative instruction issue and branch prediction. The memory is 128-bit wide, enabling sufficient instructions to be fetched simultaneously to enable the processor to run at full speed using a unified memory system. Long sequences of memory accesses require an occasional instruction fetch, consuming one extra thread cycle.

5.1 Scheduler Implementation

The threads in an xCORE Tile are intended to be used to perform several simultaneous real-time tasks such as input-output operations, so it is important that the performance of an individual thread can be guaranteed. The scheduling method used allows any number of threads to share a single unified memory system and input-output system whilst guaranteeing that with $n$ threads able to execute, each will get at least $1/n$ processor cycles. In fact, it is useful to think of a thread cycle as being $1/n$ processor cycles.

From a software design standpoint, this means that the minimum performance of a thread can be calculated by counting the number of concurrent threads at a specific point in the program. In practice, performance will almost always be higher than this because individual threads will sometimes be delayed waiting for input or output and their unused processor cycles can be taken by other threads. Further, the time taken to re-start a waiting thread is always at most one thread cycle. (Note that the use of priority threads will cause a slightly different but still predictable performance pattern, see Section 5.3.)

The set of $n$ threads can therefore be thought of as a set of virtual processors each with clock rate at least $1/n$ of the clock rate of the processor itself. The only exception to this is that if the number of threads is less than the pipeline depth $p$, the clock rate is at most $1/p$. 
Each thread has a 256-bit instruction buffer which is able to hold sixteen short instructions or eight long ones. Instructions are issued from the runnable threads in a round-robin manner, ignoring threads which are not in use or are paused waiting for a synchronisation or input-output operation.

The pipeline has a memory access stage which is available to all instructions. The rules for performing an instruction fetch are as follows.

- Any instruction which requires data-access performs it during the memory access stage.
- Branch instructions fetch their branch target instructions during the memory access stage unless they also require a data access (in which case they will leave the instruction buffer empty).
- Conditional branches only ever fetch instructions around the target address.
- Any other instruction (such as ALU operations) uses the memory access stage to perform an instruction fetch. This is used to load the thread’s own instruction buffer unless it is full.
- If the instruction buffer is empty when an instruction should be issued, a special fetch no-op is issued; this will use its memory access stage to load the issuing thread’s instruction buffer.

There are very few situations in which a fetch no-op is needed, and these can often be avoided by simple instruction scheduling in compilers or assemblers. An obvious example is to break long sequences of loads or stores by interspersing ALU operations.

Certain instructions cause threads to become non-runnable because, for example, an input channel has no available data. When the data becomes available, the thread will continue from the point where it paused.

To achieve this, each thread has an individual ready request signal. The thread identifier is passed to the resource (port, channel, timer etc) and used by the resource to select the correct ready request signal. The assertion of this will cause the thread to be re-started, normally by re-entering it into the round-robin sequence and re-issuing the input instruction. In most situations this latency is acceptable, although it results in a response time which is longer than the virtual cycle time because of the time for the re-issued instruction to pass through the pipeline.

To enable the virtual processor to perform one input or output per virtual cycle, a fast-mode is provided. When a thread is in fast-mode, it is not de-scheduled when an instruction can not complete; instead the instruction is re-issued until it completes.

Events and interrupts are slightly different from normal input and output, because a vector must also be supplied and the target instruction fetched before execution can proceed. However, the same ready request system is used. The result will be to make the thread runnable but with an empty instruction buffer.
A variation on the *fetch no-op* is the *event no-op*; this is used to access the resource which generated the event (or interrupt) using the thread identifier; the resource can then supply the appropriate vector in time for it to be used for instruction fetch during the event no-op memory access stage. This means that at most one virtual cycle is used to process the vector, so there will be at most two virtual cycles before instruction issue following an event or interrupt.

The xCORE Tile scheduler therefore allows threads to be treated as virtual processors with performance predicted by tools. There is no possibility that the performance can be reduced below these predicted levels when virtual processors are combined.

### 5.2 Single and Dual Issue

An XS2 has two *lanes*: the memory lane can execute all memory instructions, branches, and basic arithmetic, and the resource lane can execute all resource instructions and basic arithmetic. Each thread can chose to execute in *dual issue mode*, in which case the processor will execute two 16-bit instructions or a single 32-bit instruction in a single thread cycle. In dual issue mode, all instructions must be aligned: 32-bit instructions must be 32-bit aligned and pairs of 16-bit instructions must be aligned on a 32-bit boundary. The program counter is always aligned two a 32-bit boundary and points to an issue slot rather than to an individual instruction. The 16 bit value stored at addresses 4n + 2 and 4n + 3 encodes an instruction for the memory lane. The 16-bit value stored at at addresses 4n + 0 and 4n + 1 encodes an instruction for the resource lane. Long instructions are stored in a word at addresses 4n + 0...4n + 3.

Where two instructions are executed simultaneously, any destination operands should be disjoint. If they are not disjoint, an exception will be raised.

When the resource lane stalls a thread, the other lane will be stalled also. This is normally not observable, except when an interrupt or an exception is raised. On an interrupt or exception, no registers will be overwritten, and the PC will point to the instruction to be reexecuted.

If an instruction in one of the two lanes causes an exception, then this exception is reported. If the other lane is executing an instruction then this second instruction is aborted. If the instructions in both lanes cause an exception, then only one exception is reported, and both instructions are aborted, but any memory store which is in progress will complete. On an exception, the savedPC value is set to the instruction that caused the exception.

A single bit in the status register, DI, enables dual-issue. If this bit is not set, then instructions flow through one lane at a time, and mis-aligned 32-bit instructions are allowed. The dual-issue-bit is set and cleared on a per function basis. The bit is saved in the lowest bit of LR when a function call is taken. It is restored on a RETSP instruction. The dual-issue-bit is set on executing a DUALENTSP x instruction, and cleared on executing an ENTSP x instruction. This enables functions to be dual or single issue.
5.3 High priority threads

Threads can be set to be *high priority*. If no high priority threads are runnable, then a low priority thread will be scheduled if one is runnable. If high priority threads are runnable, then they will be scheduled, but at least one low priority thread will be executed on every iteration of the high priority queue. This means that all threads are always guaranteed progress.

Threads start as low-priority and only threads that require a very short turn around time or maximum throughput will be high priority.

6 Instruction Set Notation and Definitions

In the following description

- $Bpw$ is the number of bytes in a word
- $bpw$ is the number of bits in a word
- $mem$ represents the memory
- $pc$ represents the program counter
- $sr$ represents the status register
- $sp$ represents the stack pointer
- $dp$ represents the data pointer
- $cp$ represents the constant pool pointer
- $lr$ represents the link register
- $r0...r11$ represent specific operand registers
- $x$ (a single small letter) represents one of $r0...r11$
- $X$ (a single large letter) represents one of $r0...r11$, $sp$, $dp$, $cp$, $lr$
- $u_s$ is a small unsigned source operand in the range $0...11$
- $bitp$ is one of $bpw$, $1$, $2$, $3$, $4$, $5$, $6$, $7$, $8$, $16$, $24$, $32$ encoded as a $u_s$
- $u_{16}$ is a 16-bit source operand in the range $0...65535$
- $u_{20}$ is a 20-bit source operand in the range $0...1048575$
- $iw$ is the issue-width in bytes, 2 (for single issue) or 4 (for dual issue)

Note that when the program counter ($pc$) is used by an instruction, it is always pointing to the next instruction. Instructions that access the location of the current instruction use $pc_{old}$.

The operators used in this manual are:
∀ logical or
∀_{b\text{it}} bitwise or
∧ logical and
∧_{b\text{it}} bitwise and
+, −, ×, ÷, \text{mod} arithmetic operations; full precision unsigned integer, unless specified as signed
2^n integer power
l \leftarrow r assignment of r to l; if r has more bits than l, then the most significant bits of r will be ignored
\neg logical not
\neg_{b\text{it}} bitwise not
⊕ bitwise xor
\text{mem}[x] An entity at memory address x
y[\text{bit } x] A single bit of y
y[\text{bits } x..z] A slice of y comprising x – z + 1 bits; x ≥ z
x : y Concatenates x and y, ie, x \ll bpw \lor_{b\text{it}} y
\forall x \in y for each value x in the set y

Some useful functions are
\begin{align*}
zext(x, n) &= x \land (2^n - 1) \quad \text{zero extend} \\
sext(x, n) &= -(2^n - 1) \land x \lor x \quad \text{sign extend}
\end{align*}

6.1 Instruction Prefixes

If the most significant 10 bits of a \(u_{16}\) or \(u_{20}\) instruction operand are non-zero, a 16-bit prefix (PFIX) preceding the instruction is used to encode them. The least significant bits are encoded within the instruction itself.

A different kind of 16-bit prefix (EOPR) is used to encode instructions with more than three operands, or to encode the less common instructions.

7 Data Access

7.1 Access to words

The data access instructions fall into several groups. One of these provides access via the stack pointer.

LDWSP \quad D \leftarrow \text{mem}[sp + u_{16} \times Bpw] \quad \text{load word from stack}

STWSP \quad \text{mem}[sp + u_{16} \times Bpw] \leftarrow S \quad \text{store word to stack}

LDAWSP \quad D \leftarrow sp + u_{16} \times Bpw \quad \text{load address of word in stack}

Another is similar, but provides access via the data pointer.
LDWDP  \( D \leftarrow \text{mem}[dp + u_{16} \times Bpw] \)  load word from data
STWDP  \( \text{mem}[dp + u_{16} \times Bpw] \leftarrow S \)  store word to data
LDAWDP  \( D \leftarrow dp + u_{16} \times Bpw \)  load address of word in data

Access to constants and program addresses is provided by instructions which either load values directly or load them from the constant pool.

LDC  \( D \leftarrow u_{16} \)  load constant
LDWCP  \( D \leftarrow \text{mem}[cp + u_{16} \times Bpw] \)  load word from constant pool
LDAWCP  \( r11 \leftarrow cp + u_{16} \times Bpw \)  load word address in constant pool
LDWCPL  \( r11 \leftarrow \text{mem}[cp + u_{20} \times Bpw] \)  load word from constant pool long
LDAPF  \( r11 \leftarrow pc + u_{20} \times iw \)  load address in program forward
LDAPB  \( r11 \leftarrow pc - u_{20} \times iw \)  load address in program backward

Access to data structures is provided by instructions which use any of the operand registers as a base address, and combine this with a scaled offset. In the case of word accesses, the operand may be a small constant or another operand register, and the instructions are as follows:

LDWI  \( d \leftarrow \text{mem}[b + u_{s} \times Bpw] \)  load word
STWI  \( \text{mem}[b + u_{s} \times Bpw] \leftarrow s \)  store word
LDAWFI  \( d \leftarrow b + u_{s} \times Bpw \)  load address of word forward
LDAWBI  \( d \leftarrow b - u_{s} \times Bpw \)  load address of word backward

LDW  \( d \leftarrow \text{mem}[b + i \times Bpw] \)  load word
STW  \( \text{mem}[b + i \times Bpw] \leftarrow s \)  store word
LDAWF  \( d \leftarrow b + i \times Bpw \)  load address of word forward
LDAWB  \( d \leftarrow b - i \times Bpw \)  load address of word backward

7.2 Access to sub-words

In the case of access to 16-bit quantities, the base address is combined with a scaled operand, which must be an operand register. The least significant bit of the resulting address must be zero. The 16-bit item is loaded and sign extended into a word.

LD16S  \( d \leftarrow \text{sext} (\text{mem}[b + i \times 2], 16) \)  load 16-bit signed item
ST16  \( \text{mem}[b + i \times 2] \leftarrow s \)  store 16-bit item
LDA16F  \( d \leftarrow b + i \times 2 \)  load address of 16-bit item forward
LDA16B  \( d \leftarrow b - i \times 2 \)  load address of 16-bit item backward

In the case of access to 8-bit quantities, the base address is combined with an unscaled operand, which must be an operand register. The 8-bit item is loaded and zero extended into a word.

LD8U  \( d \leftarrow \text{zext} (\text{mem}[b + i], 8) \)  load byte unsigned
ST8  \( \text{mem}[b + i] \leftarrow s \)  store byte
Access to part words, including bit-fields, is provided by a small set of instructions which are used in conjunction with the shift and bitwise operations described below. These instructions provide for mask generation of any length up to 32 bits, sign extension and zero-extension from any bit position, and clearing fields within words prior to insertion of new values.

- MKMSK: \(d \leftarrow 2^s - 1\) make mask
- MKMSKI: \(d \leftarrow 2^{\text{bitp}} - 1\) make mask immediate
- SEXT: \(d \leftarrow \text{sext}(d, s)\) sign extend
- SEXTI: \(d \leftarrow \text{sext}(d, \text{bitp})\) sign extend immediate
- ZEXT: \(d \leftarrow \text{zext}(d, s)\) zero extend
- ZEXTI: \(d \leftarrow \text{zext}(d, \text{bitp})\) zero extend immediate
- ANDNOT: \(d \leftarrow d \land \neg s\) and not (clear field)

The SEXTI and ZEXTI instructions can also be used in conjunction with the LD16S and LD8U instructions to load unsigned 16-bit and signed 8-bit values.

### 7.3 Access to double words

Pairs of words can be accessed in a single instruction. This requires the address to be aligned on a two-word boundary; it must be a multiple of \(Bpw \times 2\). For store operations two destination registers must be specified, for load operations two source registers must be specified:

- LDDSP: \(d \leftarrow \text{mem}[sp + u_s \times Bpw \times 2]\) load two words from stack
  \(e \leftarrow \text{mem}[sp + u_s \times Bpw \times 2 + Bpw]\)
- STDSP: \(\text{mem}[sp + u_s \times Bpw \times 2] \leftarrow x\) store two words to stack
  \(\text{mem}[sp + u_s \times Bpw \times 2 + Bpw] \leftarrow y\)
- LDDI: \(d \leftarrow \text{mem}[b + u_s \times Bpw \times 2]\) load two words
  \(e \leftarrow \text{mem}[b + u_s \times Bpw \times 2 + Bpw]\)
- STDI: \(\text{mem}[b + u_s \times Bpw \times 2] \leftarrow x\) store two words
  \(\text{mem}[b + u_s \times Bpw \times 2 + Bpw] \leftarrow y\)
- LDD: \(d \leftarrow \text{mem}[b + i \times Bpw \times 2]\) load two words
  \(e \leftarrow \text{mem}[b + i \times Bpw \times 2 + Bpw]\)
- STD: \(\text{mem}[b + i \times Bpw \times 2] \leftarrow x\) store two words
  \(\text{mem}[b + i \times Bpw \times 2 + Bpw] \leftarrow y\)

Note that the stack pointer should be double word aligned if double loads and double stores are used. The LDDSP and STDSP instructions can be used for saving context efficiently.
8 Expression Evaluation

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Operation</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>ADDI</td>
<td>( d \leftarrow l + u_s )</td>
<td>add immediate</td>
</tr>
<tr>
<td>ADD</td>
<td>( d \leftarrow l + r )</td>
<td>add</td>
</tr>
<tr>
<td>SUBI</td>
<td>( d \leftarrow l - u_s )</td>
<td>subtract immediate</td>
</tr>
<tr>
<td>SUB</td>
<td>( d \leftarrow l - r )</td>
<td>subtract</td>
</tr>
<tr>
<td>NEG</td>
<td>( d \leftarrow -s )</td>
<td>negate</td>
</tr>
<tr>
<td>EQI</td>
<td>( d \leftarrow l = u_s )</td>
<td>equal immediate</td>
</tr>
<tr>
<td>EQ</td>
<td>( d \leftarrow l = r )</td>
<td>equal</td>
</tr>
<tr>
<td>LSU</td>
<td>( d \leftarrow l &lt; r )</td>
<td>less than unsigned</td>
</tr>
<tr>
<td>LSS</td>
<td>( d \leftarrow l &lt;_{sgn} r )</td>
<td>less than signed</td>
</tr>
<tr>
<td>AND</td>
<td>( d \leftarrow l \land_{bit} r )</td>
<td>and</td>
</tr>
<tr>
<td>OR</td>
<td>( d \leftarrow l \lor_{bit} r )</td>
<td>or</td>
</tr>
<tr>
<td>XOR</td>
<td>( d \leftarrow l \oplus r )</td>
<td>exclusive or</td>
</tr>
<tr>
<td>XOR4</td>
<td>( d \leftarrow l \oplus r \oplus s \oplus t )</td>
<td>exclusive or</td>
</tr>
<tr>
<td>NOT</td>
<td>( d \leftarrow (-1) \oplus s )</td>
<td>not</td>
</tr>
<tr>
<td>SHLI</td>
<td>( d \leftarrow l &lt;&lt;_{bit} bitp )</td>
<td>logical shift left immediate</td>
</tr>
<tr>
<td>SHL</td>
<td>( d \leftarrow l &lt;&lt; r )</td>
<td>logical shift left</td>
</tr>
<tr>
<td>SHRI</td>
<td>( d \leftarrow l &gt;&gt;_{bit} bitp )</td>
<td>logical shift right immediate</td>
</tr>
<tr>
<td>SHR</td>
<td>( d \leftarrow l &gt;&gt; r )</td>
<td>logical shift right</td>
</tr>
<tr>
<td>ASHRI</td>
<td>( d \leftarrow l &gt;&gt;_{sgn} bitp )</td>
<td>arithmetic shift right immediate</td>
</tr>
<tr>
<td>ASHR</td>
<td>( d \leftarrow l &gt;&gt;_{sgn} r )</td>
<td>arithmetic shift right</td>
</tr>
<tr>
<td>MUL</td>
<td>( d \leftarrow l \times r )</td>
<td>multiply</td>
</tr>
<tr>
<td>DIVU</td>
<td>( d \leftarrow l \div r )</td>
<td>divide unsigned</td>
</tr>
<tr>
<td>DIVS</td>
<td>( d \leftarrow l \div_{sgn} r )</td>
<td>divide signed</td>
</tr>
<tr>
<td>REMU</td>
<td>( d \leftarrow l \mod r )</td>
<td>remainder unsigned</td>
</tr>
<tr>
<td>REMS</td>
<td>( d \leftarrow l \mod_{sgn} r )</td>
<td>remainder signed</td>
</tr>
<tr>
<td>NOP</td>
<td></td>
<td>no operation</td>
</tr>
</tbody>
</table>
9 Branching, Jumping and Calling

The branch instructions include conditional and unconditional relative branches. A branch using the address in a register is provided; a relative branch which adds a scaled register operand to the program counter is provided to support jump tables.

**BITREV**  \( d : \forall_{i\times} d[\text{bit } i\times] = s[\text{bit } bpw - i\times - 1] \)  

**BYTEREV**  \( d : \forall_{i\times} d[\text{byte } i\times] = s[\text{byte } Bpw - i\times - 1] \)  

**CLZ**  \( d : \text{first } d : s[\text{bit } bpw - d] = 1 \)  

**ZIP**  \( w - 2^s \)  

\[
z \leftarrow d[bpw - 1..bpw - w - 1] : \\
e[bpw - 1..bpw - w - 1] : \\
d[bpw - w - 1..bpw - 2 \times w - 1] : \\
e[bpw - w - 1..bpw - 2 \times w - 1] : ... : \\
d[w - 1..0] : \\
e[w - 1..0] : \\
d \leftarrow z[2bpw - 1..bpw] \\
e \leftarrow z[bpw - 1..0] 
\]

**UNZIP**  \( w - 2^s \)  

\[
z \leftarrow d : e \\
d \leftarrow z[2 \times bpw - 1..2 \times bpw - w - 1] : \\
z[2 \times bpw - 2w - 1..2 \times bpw - 3w - 1] : ... : \\
z[2w - 1..w] \\
e \leftarrow z[2 \times bpw - w - 1..2 \times bpw - 2w - 1] : \\
z[2 \times bpw - 3w - 1..2 \times bpw - 4w - 1] : ... : \\
z[w - 1..0] 
\]
In some cases, the calling instructions described below can be used to optimise branches; as they overwrite the link register they are not suitable for use in leaf procedures which do not save the link register.

The procedure calling instructions include relative calls, calls via the constant pool, indexed calls via a dedicated register (\(r_{11}\)) and calls via a register. Most calls within a single program module can be encoded in a single instruction; inter-module calling requires at most two instructions.

- **BLRF**
  \[
  \begin{align*}
  l & \leftarrow pc \; \lor \; sr[bit \; di]; \\
  pc & \leftarrow pc + u_{20} \times iw
  \end{align*}
  \]
  branch and link relative forward

- **BLRB**
  \[
  \begin{align*}
  l & \leftarrow pc \; \lor \; sr[bit \; di]; \\
  pc & \leftarrow pc - u_{20} \times iw
  \end{align*}
  \]
  branch and link relative backward

- **BLACP**
  \[
  \begin{align*}
  l & \leftarrow pc \; \lor \; sr[bit \; di]; \\
  pc & \leftarrow \text{mem}[cp + u_{20} \times Bpw]
  \end{align*}
  \]
  branch and link absolute via CP

- **BLAT**
  \[
  \begin{align*}
  l & \leftarrow pc \; \lor \; sr[bit \; di]; \\
  pc & \leftarrow \text{mem}[r_{11} + u_{16} \times Bpw]
  \end{align*}
  \]
  branch and link absolute via table

- **BLA**
  \[
  \begin{align*}
  l & \leftarrow pc \; \lor \; sr[bit \; di] \\
  pc & \leftarrow s
  \end{align*}
  \]
  branch and link absolute via register

Notice that control transfers which do not affect the link (required for tail calls to procedures) can be performed using one of the LDWCP, LDWCPL, LDAPF or LDAPB instructions followed by BAU \(r_{11}\).

Calling may require modification of the stack. Typically, the stack is extended on procedure entry and contracted on exit. The instructions to support this are shown below.
Functions can be made that can be entered in either single or dual issue:

- A single issue function must start with either a 32-bit aligned, long ENTSP instruction, or a short 32-bit aligned instruction that is paired with a dual-issuable instruction. This enables the function to be called from both single and dual issue contexts.

- A DUALENTSP instruction must either be a long instruction that is 32-bit aligned, or it must be a short DUALENTSP that is stored in the third and fourth byte of the word, together with an instruction that can be executed in the resource lane.

A short DUALENTSP executed in single issue stored in the lower 16-bits of a word will raise an exception in the following instruction, since the PC will be misaligned.

Notice that the stack and data area can be contracted using the LDAWSP and LDAWDP instructions.

In some situations, it is necessary to change to a new stack pointer, data pointer or pool pointer on entry to a procedure. Saving or restoring any of the existing pointers can be done using normal STWS, STWD, LDWS or LDWD instructions; loading them from another register can be optimised using the following instructions.

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>EXTSP</td>
<td>$sp \leftarrow sp - u_{16} \times Bpw$ extend stack</td>
</tr>
<tr>
<td>EXTDP</td>
<td>$dp \leftarrow dp - u_{16} \times Bpw$ extend data</td>
</tr>
<tr>
<td>ENTSP</td>
<td>if $u_{16} &gt; 0$ then SI entry and extend stack</td>
</tr>
<tr>
<td></td>
<td>$mem[sp] \leftarrow lr$;</td>
</tr>
<tr>
<td></td>
<td>$sp \leftarrow sp - u_{16} \times Bpw$</td>
</tr>
<tr>
<td></td>
<td>$sr[bit\ di] \leftarrow false$</td>
</tr>
<tr>
<td>DUALENTSP</td>
<td>if $u_{16} &gt; 0$ then DI entry and extend stack</td>
</tr>
<tr>
<td></td>
<td>$mem[sp] \leftarrow lr$;</td>
</tr>
<tr>
<td></td>
<td>$sp \leftarrow sp - u_{16} \times Bpw$</td>
</tr>
<tr>
<td></td>
<td>$sr[bit\ di] \leftarrow true$ u6</td>
</tr>
<tr>
<td>RETSP</td>
<td>if $u_{16} &gt; 0$ then contract stack and return</td>
</tr>
<tr>
<td></td>
<td>$sp \leftarrow sp + u_{16} \times Bpw$;</td>
</tr>
<tr>
<td></td>
<td>$lr \leftarrow mem[sp]$;</td>
</tr>
<tr>
<td></td>
<td>$sr[bit\ di] \leftarrow lr \land 1$</td>
</tr>
<tr>
<td></td>
<td>$pc \leftarrow lr \land \neg 1$</td>
</tr>
</tbody>
</table>

- A single issue function must start with either a 32-bit aligned, long ENTSP instruction, or a short 32-bit aligned instruction that is paired with a dual-issuable instruction. This enables the function to be called from both single and dual issue contexts.

- A DUALENTSP instruction must either be a long instruction that is 32-bit aligned, or it must be a short DUALENTSP that is stored in the third and fourth byte of the word, together with an instruction that can be executed in the resource lane.

A short DUALENTSP executed in single issue stored in the lower 16-bits of a word will raise an exception in the following instruction, since the PC will be misaligned.

Notice that the stack and data area can be contracted using the LDAWSP and LDAWDP instructions.

In some situations, it is necessary to change to a new stack pointer, data pointer or pool pointer on entry to a procedure. Saving or restoring any of the existing pointers can be done using normal STWS, STWD, LDWS or LDWD instructions; loading them from another register can be optimised using the following instructions.

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>SETSP</td>
<td>$sp \leftarrow s$ set stack pointer</td>
</tr>
<tr>
<td>SETDP</td>
<td>$dp \leftarrow s$ set data pointer</td>
</tr>
<tr>
<td>SETCP</td>
<td>$cp \leftarrow s$ set pool pointer</td>
</tr>
</tbody>
</table>

- A single issue function must start with either a 32-bit aligned, long ENTSP instruction, or a short 32-bit aligned instruction that is paired with a dual-issuable instruction. This enables the function to be called from both single and dual issue contexts.

- A DUALENTSP instruction must either be a long instruction that is 32-bit aligned, or it must be a short DUALENTSP that is stored in the third and fourth byte of the word, together with an instruction that can be executed in the resource lane.

A short DUALENTSP executed in single issue stored in the lower 16-bits of a word will raise an exception in the following instruction, since the PC will be misaligned.

Notice that the stack and data area can be contracted using the LDAWSP and LDAWDP instructions.

In some situations, it is necessary to change to a new stack pointer, data pointer or pool pointer on entry to a procedure. Saving or restoring any of the existing pointers can be done using normal STWS, STWD, LDWS or LDWD instructions; loading them from another register can be optimised using the following instructions.

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>SETSP</td>
<td>$sp \leftarrow s$ set stack pointer</td>
</tr>
<tr>
<td>SETDP</td>
<td>$dp \leftarrow s$ set data pointer</td>
</tr>
<tr>
<td>SETCP</td>
<td>$cp \leftarrow s$ set pool pointer</td>
</tr>
</tbody>
</table>
10 Resources and the Thread Scheduler

Each xCORE Tile manages a number of different types of resource. These include threads, synchronisers, channel ends, timers and locks. For each type of resource a set of available items is maintained. The names of these sets are used to identify the type of resource to be allocated by the GETR (get resource) instruction. When the resource is no longer needed, it can be released for subsequent use by a FREER (free resource) instruction.

GETR \( r \leftarrow \text{first } res \in \text{setof}(us): \neg \text{inuse}_{res} \); get resource

\[ \text{inuse}_r \leftarrow \text{true} \]

FREER \( \text{inuse}_r \leftarrow \text{false} \) free resource

In the above \( \text{setof}(r) \) returns the set corresponding to the source operand of \( r \).

The resources are:

<table>
<thead>
<tr>
<th>resource name</th>
<th>set</th>
<th>use</th>
</tr>
</thead>
<tbody>
<tr>
<td>THREAD</td>
<td>threads</td>
<td>concurrent execution</td>
</tr>
<tr>
<td>SYNC</td>
<td>synchronisers</td>
<td>thread synchronisation</td>
</tr>
<tr>
<td>CHANEND</td>
<td>channel ends</td>
<td>thread communication</td>
</tr>
<tr>
<td>TIMER</td>
<td>timers</td>
<td>timing</td>
</tr>
<tr>
<td>LOCK</td>
<td>locks</td>
<td>mutual exclusion</td>
</tr>
</tbody>
</table>

Some resources have associated control modes which are set using the SETC instruction.

SETC \( \text{control}_r \leftarrow u_{16} \) set resource control

Many of the mode settings are defined only for a specific kind of resource and are described in the appropriate section; the ones which are used for several different kinds of resource are:

<table>
<thead>
<tr>
<th>mode</th>
<th>effect</th>
</tr>
</thead>
<tbody>
<tr>
<td>OFF</td>
<td>resource off</td>
</tr>
<tr>
<td>ON</td>
<td>resource on</td>
</tr>
<tr>
<td>START</td>
<td>resource active</td>
</tr>
<tr>
<td>STOP</td>
<td>resource inactive</td>
</tr>
<tr>
<td>EVENT</td>
<td>resource will cause events</td>
</tr>
<tr>
<td>INTERRUPT</td>
<td>resource will raise interrupts</td>
</tr>
</tbody>
</table>

Execution of instructions from each thread is managed by the thread scheduler. This maintains a set of runnable threads, \( run \), from which it takes instructions in
turn. When a thread is unable to continue, it is paused by removing it from the run set. The reason for this may be any of the following.

- Its registers are being initialised prior to it being able to run.
- It is waiting to synchronise with another thread before continuing.
- It is waiting to synchronise with another thread and terminate (a join).
- It has attempted an input from a channel which has no data available, or a port which is not ready, or a timer which has not reached a specified time.
- It has attempted an output to a channel or a port which has no room for the data.
- It has executed an instruction causing it to wait for one of a number of events or interrupts which may be generated when channels, ports or timers become ready for input.

The thread scheduler manages the threads, thread synchronisation and timing (using the synchronisers and timers). It is directly coupled to resources such as the ports and channels so as to minimise the delay when a thread becomes runnable as a result of a communication or input-output.

11 Concurrency and Thread Synchronisation

A thread can initiate execution on one or more newly allocated threads, and can subsequently synchronise with them to exchange data or to ensure that all threads have completed before continuing. Thread synchronisation is performed using hardware synchronisers, and threads using a synchroniser will move between running states and paused states. When a thread is first created, its status register is initialised as follows:

\[
\begin{align*}
&sr[\text{bit } eeb] ← 0 \\
&sr[\text{bit } ie] ← 0 \\
&sr[\text{bit } n] ← 0 \\
&sr[\text{bit } in] ← 0 \\
&sr[\text{bit } hip] ← 0 \\
&sr[\text{bit } fast] ← 0 \\
&sr[\text{bit } ked] ← 0 \\
&sr[\text{bit } waiting] ← 1 \quad \text{the thread is paused} \\
&sr[\text{bit } di] ← 0
\end{align*}
\]

The access registers of the newly created thread can be initialised using the following instructions.
TINITPC \( pc_t \leftarrow s \) set thread pc
TINITSP \( sp_t \leftarrow s \) set thread stack
TINITDP \( dp_t \leftarrow s \) set thread data
TINITCP \( cp_t \leftarrow s \) set thread pool
TINITLR \( lr_t \leftarrow s \) set thread link

These instructions can only be used when the thread is paused. The TINITLR instruction is intended primarily to support debugging. On thread initialisation, the PC must be initialised. DP, SP, and CP will retain their value on freeing and allocating threads, so they may not have to be reinitialised.

Data can be transferred between the operand registers of two threads using TSETR and TSETMR instructions, which can be used even when the destination thread is running.

TSETR \( d_t \leftarrow s \) set thread operand register
TSETMR \( d_{mstr(tid)} \leftarrow s \) set master thread operand register

To start a synchronised slave thread a master must first acquire a synchroniser. This is done using a GETR SYNC instruction. If there is a synchroniser available its resource ID is returned, otherwise the invalid resource ID is returned. The GETST instruction is then used to get a synchronised thread. It is passed the synchroniser ID and if there is a free thread it will be allocated, attached to the synchroniser and its ID returned, otherwise the invalid resource ID is returned.

The master thread can repeat this process to create a group of threads which will all synchronise together. To start the slave threads the master executes an MSYNC instruction using the synchroniser ID.

GETST \( d \leftarrow \text{first } t \in \text{threads} : \neg \text{inuse}_t; \text{inuse}_d \leftarrow \text{true}; \text{spaused} \leftarrow \text{spaused} \cup \{d\}; \text{slaves}_s \leftarrow \text{slaves}_s \cup \{d\} \text{mstr}_s \leftarrow \text{tid} \)

MSYNC \( \text{if } (\text{slaves}_s \setminus \text{spaused} = \emptyset) \) master synchronise
then \( \text{spaused} \leftarrow \text{spaused} \setminus \text{slaves}_s \) else \( \text{mpaused} \leftarrow \text{mpaused} \cup \{\text{tid}\}; \text{msyn}_s \leftarrow \text{true} \)

The group of threads can synchronise at any point by the slaves executing the SSYNC and the master the MSYNC. Once all the threads have synchronised they are unpaused and continue executing from the next instruction. The processor maintains a set of paused master threads \( \text{mpaused} \) and a set of paused slave threads \( \text{spaused} \) from which it derives the set of runnable threads \( \text{run} \):
run = \{thread ∈ threads : inuse\_thread\} \setminus (spaus\_ed \cup mpau\_sed)

Each synchroniser also maintains a record $msyn_s$ of whether its master has reached a synchronisation point.

SSYNC if $(slaves_{syn}(tid) \setminus spaus = \{tid\}) \land msyn_{syn}(tid)$ then slave synchronise

if $mjoin_{syn}(tid)$ then
forall $t ∈ slaves_{syn}(tid) : inuse_t ← false$;
$mjoin_{syn}(tid) ← false$
else
spaus \leftarrow spaus \setminus slaves_{syn}(tid); 
mpaus \leftarrow mpaus \setminus \{mstr_{syn}(tid)\}; 
msyn_{syn}(tid) ← false$
else
spaus \leftarrow spaus \cup \{tid\}

To terminate all of the slaves and allow the master to continue the master executes an MJOIN instruction instead of an MSYNC. When this happens, the slave threads are all freed and the master continues.

MJOIN if $(slaves_s \setminus spaus = \emptyset)$ then master join

forall $t ∈ slaves_s : inuse_t ← false$;
$mjoin_{syn}(tid) ← false$
else
mpaus \leftarrow mpaus \cup \{tid\}; 
mjoin_s ← true; 
msyn_s ← true

A master thread can also create threads which can terminate themselves. This is done by the master executing a GETR THREAD instruction. This instruction returns either a thread ID if there is a free thread or the invalid resource ID. The unsynchronised thread can be initialised in the same way as a synchronised thread using the TINITPC, TINITSP, TINITDP, TINITCP, TINITLR and TSETR instructions.

The unsynchronised thread is then started by the master executing a TSTART instruction specifying the thread ID. Once the thread has completed its task it can terminate itself with the FREET instruction.

TSTART spaus \leftarrow spaus \setminus \{tid\} start thread

FREET inuse_{tid} ← false; free thread

The identifier of an executing thread can be accessed by the GETID instruction.
12 Communication

Communication between threads is performed using channels, which provide full-duplex data transfer between channel ends, whether the ends are both in the same xCORE Tile, in different xCORE Tiles on the same chip or in xCORE Tiles on different chips. Channels carry messages constructed from data and control tokens between the two channel ends. The control tokens are used to encode communication protocols. Although most control tokens are available for software use, a number are reserved for encoding the protocol used by the interconnect hardware, and can not be sent and received using instructions.

A channel end can be used to generate events and interrupts when data becomes available as described below. This allows a thread to monitor several channels, ports or timers, only servicing those that are ready.

To communicate between two threads, two channel ends need to be allocated, one for each thread. This is done using the GETR \(c\), CHANEND instruction. Each channel end has a destination register which holds the identifier of the destination channel end; this is initialised with the SETD instruction. It is also possible to use the identifier of a channel end to determine its destination channel end.

\[
\begin{align*}
\text{SETD} & \quad r_{\text{dest}} \leftarrow s \quad \text{set destination} \\
\text{GETD} & \quad d \leftarrow r_{\text{dest}} \quad \text{get destination}
\end{align*}
\]

The identifier of the channel end \(c_1\) is used to initialise the channel end for thread \(c_2\), and vice versa. Each thread can then use the identifier of its own channel end to transfer data and messages using output and input instructions.

The interconnect can be partitioned into several independent networks. This makes it possible, for example, to allocate channels carrying short control messages to one network whilst allocating channels carrying long data messages to another. There are instructions to allocate a channel to a network and to determine which network a channel is using.

\[
\begin{align*}
\text{SETN} & \quad c_{\text{net}} \leftarrow s \quad \text{set network} \\
\text{GETN} & \quad d \leftarrow c_{\text{net}} \quad \text{get network}
\end{align*}
\]

In the following, \(c \leftarrow s\) represents an output of \(s\) to channel \(c\) and \(c \rightarrow d\) represents an input from channel \(c\) to \(d\).
OUTT \( c \leftarrow dtoken(s) \) output token

OUTCT \( c \leftarrow ctoken(s) \) output control token

OUTCTI \( c \leftarrow ctoken(us) \) output control token immediate

INT \[\begin{align*}
\text{if } & \text{hasctoken}(c) \\
\text{then } & \text{trap} \\
\text{else } & \text{$c \leftarrow d$}
\end{align*}\] input token

INCT \[\begin{align*}
\text{if } & \text{hasctoken}(c) \\
\text{then } & \text{$c \leftarrow d$} \\
\text{else } & \text{trap}
\end{align*}\] input control token

CHKCT \[\begin{align*}
\text{if } & \text{hasctoken}(c) \land (s = \text{token}(c)) \\
\text{then } & \text{skiptoken}(c) \\
\text{else } & \text{trap}
\end{align*}\] check control token

CHKCTI \[\begin{align*}
\text{if } & \text{hasctoken}(c) \land (s = \text{token}(c)) \\
\text{then } & \text{skiptoken}(c) \\
\text{else } & \text{trap}
\end{align*}\] check control token immediate

OUT \( c \leftarrow s \) output data word

IN \[\begin{align*}
\text{if } & \text{containsctoken}(c) \\
\text{then } & \text{trap} \\
\text{else } & \text{$c \leftarrow d$}
\end{align*}\] input token

TESTCT \( d \leftarrow \text{hasctoken}(c) \) test for control token

TESTWCT \( d \leftarrow \text{containsctoken}(c) \) test word for control token

The channel connection is established when the first output is executed. If the destination channel end is on another xCORE Tile, this will cause the destination identifier to be sent through the interconnect, establishing a route for the subsequent data and control tokens. The connection is terminated when an END control token is sent. If a subsequent output is executed using the same channel end, the destination identifier will be used again to establish a new route which will again persist until another END control token is sent.

A destination channel end can be shared by any number of outputting threads; they are served in a round-robin manner. Once a connection has been established it will persist until an END is received; any other thread attempting to establish a connection will be queued. In the case of a shared channel end, the outputting thread will usually transmit the identifier of its channel end so that the inputting thread can use it to reply.

The OUT and IN instructions are used to transmit words of data through the channel; to transmit bytes of data the OUTT and INT instructions are used. Control tokens are sent using OUTCT or OUTCTI and received using INCT. To support efficient runtime checks that the type, length or structure of output data matches that expected by the inputer, CHKCT and CHKCTI instructions are provided.
CHKCT instruction inputs and discards a token provided that the input token matches its operand; otherwise it traps. The normal IN and INT instructions trap if they encounter a control token. To input a control token INCT is used; this traps if it encounters a data token.

The END control token is one of the 12 tokens which can be sent using OUTCTI and checked using CHKCTI. By following each message output with an OUTCTI $c$, END and each input with a CHKCTI $c$, END it is possible to check that the size of the message is the same as the size of the message expected by the inputting thread. To perform synchronised communication, the output message should be followed with (OUTCTI $c$, END; CHKCTI $c$, END) and the input with (CHKCTI $c$, END; OUTCTI $c$, END).

Another control token is PAUSE. Like END, this causes the route through the interconnect to be disconnected. However the PAUSE token is not delivered to the receiving thread. It is used by the outputting thread to break up long messages or streams, allowing the interconnect to be shared efficiently. The remaining control tokens are used for runtime checking and for signalling the type of message being received; they have no effect on the interconnect. Note that in addition to END and PAUSE, ten of these can be efficiently handled using OUTCTI and CHKCTI.

A control token takes up a single byte of storage in the channel. On the receiving end the software can test whether the next token is a control token using the TESTCT instruction, which waits until at least one token is available. It is also possible to test whether the next word contains a control token using the TESTWCT instruction. This waits until a whole word of data tokens has been received (in which case it returns 0) or until a control token has been received (in which case it returns the byte position after the position of the byte containing the control token).

Channel ends have a buffer able to hold sufficient tokens to allow at least one word to be buffered. If an output instruction is executed when the channel is too full to take the data then the thread which executed the instruction is paused. It is restarted when there is enough room in the channel for the instruction to successfully complete. Likewise, when an input instruction is executed and there is not enough data available then the thread is paused and will be restarted when enough data becomes available.

Note that when sending long messages to a shared channel, the sender should send a short request and then wait for a reply before proceeding as this will minimise interconnect congestion caused by delays in accepting the message.

When a channel end $c$ is no longer required, it can be freed using a FREER $c$ instruction. Otherwise it can be used for another message.

It is sometimes necessary to determine the identifier of the destination channel end $c2$ stored in channel end $c1$. For example, this enables a thread to transmit the identifier of a destination channel end it has been using to a thread on another processor. This can be done using the GETD instruction. It is also useful to be able to determine quickly whether a destination channel end $c2$ stored in channel end $c1$ is on the same processor as $c1$; this makes it possible to optimise communication of large data structures where the two communicating threads are executed by the same processor.
13 Locks

Mutual exclusion between a number of threads can be performed using locks. A lock is allocated using a GETR _l_, LOCK instruction. The lock is initially free. It can be claimed using an IN instruction and freed using an OUT instruction.

When a thread executes an IN on a lock which is already claimed, it is paused and placed in a queue waiting for the lock. Whenever a lock is freed by an OUT instruction and the lock’s queue is not empty, the next thread in the queue is unpaused; it will then succeed in claiming the lock.

When inputting from a lock, the IN instruction always returns the lock identifier, so the same register can be used as both source and destination operand. When outputting to a lock, the data operand of the OUT instruction is ignored.

When the lock is no longer needed, it can be freed using a FREER _l_ instruction.

14 Timers

Each xCORE Tile executes instructions at a speed determined by its own clock input. In addition, it provides a reference clock output which ticks at a standard frequency of 100MHz. A set of programmable timers is provided and all of these can be used by threads to provide timed program execution relative to the reference clock.

14.1 Using timers

The processor has a set of timers that can be used to wait for a time. The current time can be input from any timer, or it can be obtained by using GETTIME:

\[ \text{GETTIME } d \leftarrow \text{current time} \]

Each timer can be used by a thread to read its current time or to wait until a specified time. A timer is allocated using the GETR _t_, TIMER instruction. It can be configured using the SETC instruction; the only two modes which can be set are UNCOND and AFTER.

<table>
<thead>
<tr>
<th>mode</th>
<th>effect</th>
</tr>
</thead>
<tbody>
<tr>
<td>UNCOND</td>
<td>timer always ready; inputs complete immediately</td>
</tr>
<tr>
<td>AFTER</td>
<td>timer ready when its current time is after its DATA value</td>
</tr>
</tbody>
</table>

In unconditional mode, an IN instruction reads the current value of the timer. In AFTER mode, the IN instruction waits until the value of its current time is after (later than) the value in its DATA register. The value can be set using a SETD instruction. Timers can also be used to generate events as described below.
A set of programmable clocks is also provided and each can be used to produce a clock output to control the action of one or more ports and their associated port timers. The ports are connected to a clock using the SETCLK instruction.

\[ \text{SETCLK} \quad \text{clock}_d \leftarrow s \quad \text{set clock source} \]

Each port \( p \) which is to be clocked from a clock \( c \) can be connected to it by executing a SETCLK \( p, c \) instruction.

Each clock can use a one bit port as its clock source. A clock \( c \) which is to use a port \( p \) as its clock source can be connected to it by executing a SETCLK \( p, c \) instruction. Alternatively, a clock may use the reference clock as its clock source (by SETCLK \( p, \text{REF} \)). In either case the clock can be configured to divide the frequency using an 8-bit divider. When this is set to 0, the clock passes directly to the output. The falling edge of the clock is used to perform the division. Hence a setting of 1 will result in an output from the clock which changes each falling edge of the input, halving the input frequency \( f \); and a setting of \( n \) will produce an output frequency of \( f/2^n \). The division factor is set using the SETD instruction. The lowest eight bits of the operand are used and the rest ignored.

To ensure that the timers in the ports which are attached to the same clock all record the same time, the clock should be started using a SETC \( c, \text{START} \) instruction after the ports have all been attached to the clock. All of the clocks are initially stopped and a clock can be stopped by a SETC \( c, \text{STOP} \) instruction.

The data output on the pins of an output port changes state synchronously with the port clock. If several output ports are driven from the same clock, they will appear to operate as a single output port, provided that the processor is able to supply new data to all of them during each clock cycle. Similarly, the data input by an input port from the port pins is sampled synchronously with the port clock. If several input ports are driven from the same clock they will appear to operate as a single input port provided that the processor is able to take the data from all of them during each clock cycle.

The use of clocked ports therefore decouples the internal timing of input and output program execution from the operation of synchronous input and output interfaces.

15 Ports, Input and Output

Ports are interfaces to physical pins. A port can be used for input or output. It can use the reference clock as its port clock or it can use one of the programmable clocks. Transfers to and from the pins can be synchronised with the execution of input and output instructions, or the port can be configured to buffer the transfers and to convert automatically between serial and parallel form. Ports can also be timed to provide precise timing of values appearing on output pins or taken from input pins. When inputting, a condition can be used to delay the input until the data in the port meets the condition. When the condition is met the captured data is time stamped with the time at which it was captured.

The port clock input is initially the reference clock. It can be changed using the SETCLK instruction with a clock ID as the clock operand. This port clock drives the
port timer and can also be used to determine when data is taken from or presented to the pins.

A port can be used to generate events and interrupts when input data becomes available as described below. This allows a thread to monitor several ports, channels or timers, only servicing those that are ready.

15.1 Input and Output

Each port has a transfer register. The input and output instructions used for channels, IN and OUT, can also be used to transfer data to and from a port transfer register. The IN instruction zero-extends the contents of a port transfer register and transfers the result to an operand register. The OUT instruction transfers the least significant bits from an operand register to a port transfer register.
Two further instructions, INSHR and OUTSHR, optimise the transfer of data. The INSHR instruction shifts the contents of its destination register right, filling the left-most bits with the data transferred from the port. The OUTSHR instruction transfers the least significant bits of data from its source register to the port and shifts the contents of the source register right.

\[
\text{OUTSHR} \quad p \leftarrow s[\text{bits } 0 \text{ for } trwidth(p)]; \quad \text{output to port}
\]
\[
s \leftarrow s >> trwidth(p)
\]
\[
\text{shift and}
\]

\[
\text{INSHR} \quad s \leftarrow s >> trwidth(p);
\]
\[
p \leftarrow s[\text{bits } (bpw - trwidth(p)) \text{ for } trwidth(p)] \quad \text{input from port}
\]

The transfer register is accessed by the processor; it is also accessed by the port when data is moved to or from the pins. When the processor writes data into the transfer register it fills the transfer register; when the processor takes data from the transfer register it empties the transfer register.

### 15.2 Port Configuration

A port is initially OFF with its pins in a high impedance state. Before it is used, it must be configured to determine the way it interacts with its pins, and set ON, which also has the effect of starting the port. The port can subsequently be stopped and started using SETC \( p \), STOP and SETC \( p \), START; between these the port configuration can be changed.

The port configuration is done using the SETC instruction which is used to define several independent settings of the port. Each of these has a default mode and need only be configured if a different mode is needed. The effect of the SETC mode settings is described below. The **bold** entry in each setting is the default mode.
<table>
<thead>
<tr>
<th>mode</th>
<th>effect</th>
</tr>
</thead>
<tbody>
<tr>
<td>NOREADY</td>
<td>no ready signals are used</td>
</tr>
<tr>
<td>HANDSHAKEN</td>
<td>both ready input and ready output signals are used</td>
</tr>
<tr>
<td>STROBED</td>
<td>one ready signal is used (output on master, input on slave)</td>
</tr>
<tr>
<td>SYNCHRONISED</td>
<td>processor synchronises with pins</td>
</tr>
<tr>
<td>BUFFERED</td>
<td>port buffers data between pins and processor</td>
</tr>
<tr>
<td>SLAVE</td>
<td>port acts as a slave</td>
</tr>
<tr>
<td>MASTER</td>
<td>port acts as a master</td>
</tr>
<tr>
<td>NOSDELAY</td>
<td>input sample not delayed</td>
</tr>
<tr>
<td>SDELAY</td>
<td>input sample delayed half a clock period</td>
</tr>
<tr>
<td>DATAPORT</td>
<td>port acts as normal</td>
</tr>
<tr>
<td>CLOCKPORT</td>
<td>the port outputs its source clock</td>
</tr>
<tr>
<td>READYPORT</td>
<td>the port outputs a ready signal</td>
</tr>
<tr>
<td>DRIVE</td>
<td>pins are driven both high and low</td>
</tr>
<tr>
<td>PULLDOWN</td>
<td>pins pull down for 0 bits, are high impedance otherwise</td>
</tr>
<tr>
<td>PULLUP</td>
<td>pins pull up for 1 bits, but are high impedance otherwise</td>
</tr>
<tr>
<td>NOINVERT</td>
<td>data is not inverted</td>
</tr>
<tr>
<td>INVERT</td>
<td>data is inverted</td>
</tr>
</tbody>
</table>

The DRIVE, PULLDOWN and PULLUP modes determine the way the pins are driven when outputting, and the way they are pulled when inputting. The CLOCKPORT, READYPORT and INVERT settings can only be used with 1-bit ports.

Initially, the port is ready for input. Subsequently, it may change to output data when an output instruction is executed; after outputting it may change back to inputting when an input instruction is executed.

It is sometimes useful to read the data on the pins when the port is outputting; this can be done using the PEEK instruction:

```
PEEK d ← pins(p)   read port pins
```

### 15.3 Configuring Ready and Clock Signals

A port can be configured to use *ready input* and *ready output* signals.

A port’s ready input signal is input by an associated one-bit port. This association is made using the SETRDY instruction.
SETRDY \quad \text{ready}_p \leftarrow s \quad \text{set source of port ready input}

A port’s ready output signal is output by another associated one-bit port. A one-bit port \( r \) which is to be used as a ready output must first be configured in READYPORT mode by SETC \( r \), READYPORT. This ready port \( r \) can then be associated with a port \( p \) by SETRDY \( r \), \( p \).

A one-bit port can be used to output a clock signal by setting it into CLOCKPORT mode; its clock source is set using the SETCLK instruction.

When a 1-bit port is configured to be in CLOCKPORT or READYPORT mode, the drive mode and invert mode are configurable as normal.

### 15.4 NOREADY mode

If the port is in NOREADY mode, no ready signals are used and data is moved to and from the pins either asynchronously (at times determined by the execution of input and output instructions) or synchronously with the port clock, irrespective of whether the port is in MASTER or SLAVE mode.

At most one input or output is performed per cycle of the port clock.

### 15.5 HANDSHAKEN mode

In HANDSHAKEN mode, ready signals are used to control when data is moved to or from a port’s pins.

A port in MASTER HANDSHAKEN mode initiates an output cycle by moving data to the pins and asserting the ready output (request); it then waits for the ready input (reply) to be asserted. It initiates an input cycle by asserting the ready output (request) and waiting for the ready input (reply) to be asserted along with the data; it then takes the data.

A port in SLAVE HANDSHAKEN mode waits for the ready input (request) to be asserted. It performs an input cycle by taking the data and asserting the ready output (reply); it performs an output cycle by moving data to the pins and asserting the ready output (reply).

The ready signals accompany the data in each cycle of the port clock. The falling edge of the port clock initiates the set up of data or a change of port direction; the port timer also advances on this edge. On output, the data and the ready output will be valid on the rising edge of the port clock. On input, data and the ready input will be sampled on the rising edge of the port clock unless the port is configured as SDELAY, in which case they are sampled on the falling edge.

### 15.6 STROBED mode

In STROBED mode only one ready signal is used and the port can be in MASTER or SLAVE mode. A MASTER port asserts its ready output and the slave has to keep up; a SLAVE port has to keep up with the ready input.
Note that a port in NOREADY mode behaves in the same way as a port in STROBED mode which is always ready.

15.7 The Port Timer

A port has a timer which can be used to cause the transfer of data to or from the pins to take place at a specified time. The time at which the transfer is to be performed is set using the SETPT (set port time) instruction. Timed ports are often used together with timestamping as this allows precise control of response times.

\[
\text{SETPT } \text{porttime}_p \leftarrow s \quad \text{set port time}
\]

\[
\text{CLRPT } \text{clearporttime}(p) \quad \text{clear port time}
\]

\[
\text{GETTS } d \leftarrow \text{timestamp}_p \quad \text{get port timestamp}
\]

The CLRPT instruction can be used to cancel a timed transfer.

The timestamp which is set when a port becomes ready for input can be read using the GETTS instruction.
15.8 Conditions

A port has an associated condition which can be used to prevent the processor from taking input from the port when the condition is not met. The conditions are set using the SETC instruction. The value used for comparison in some of the conditions is held in the port data register, which can be set using the SETD instruction.

<table>
<thead>
<tr>
<th>mode</th>
<th>port ready condition</th>
</tr>
</thead>
<tbody>
<tr>
<td>NONE</td>
<td>no condition</td>
</tr>
<tr>
<td>EQ</td>
<td>value on pins equal to port data register value</td>
</tr>
<tr>
<td>NEQ</td>
<td>value on pins not equal to port data register value</td>
</tr>
</tbody>
</table>

The simplest condition is NONE. The other conditions all involve comparing the value from the pins with the value in the port data register.

When the condition is met a timestamp is set and the port becomes ready for input.

When a port is used to generate an event, the data which satisfied the condition is held in the transfer register and the timestamp is set. The value returned by a subsequent input on the port is guaranteed to meet the condition and to correspond to the timestamp even if the value on the port has changed.

15.9 Synchronised Transfers

A port in SYNCHRONISED mode ensures that the signalling operation of the port pins is synchronised with the processor instruction execution.

When a SETPT instruction is used, the movement of data between the pins and the transfer register takes place when the current value of the port timer matches the time specified with the SETPT instruction.

If the port is used for output and the transfer register is full, the SETPT instruction will pause until the transfer register is empty. This ensures that the port time is not changed until the pending output has completed.

If a condition other than NONE is used the port will only be ready for input when the data in the transfer register matches the condition. If an input instruction is executed and the specified condition is not met, the thread executing the input will be paused until the condition is met; the thread then resumes and completes the input. The value of the port timer corresponding to the data in the transfer register when a port condition is met is recorded in the port timestamp register. The timestamp register is read at any time using the GETTS instruction.

15.10 Buffered Transfers

A port in BUFFERED mode buffers the transfer of data between the processor and the pins through the use of a shift register, which is situated between the transfer register and the pins. A buffered port can be used to convert between parallel and serial form using its shift register. The number of bits in the transfer register and
the shift register determines the width of the transfers (the transfer width) between the processor and the port; this is a multiple of the port width (the number of pins) and can be set by the SETTW instruction.

SETTW \[ width_p \leftarrow s \] set port transfer width

For a 32-bit wordlength, the transfer width is normally 32, 8, 4 or 1 bit.

Note that in contrast to a synchronised transfer, where the transfer width and the port width are equal, the transfer width of a buffered transfer can differ from the port width.

On input, the shift register is full when \( n \) values have been taken from the \( p \) pins, where \( n \times p \) is the transfer width; it will then be emptied to the transfer register ready for an input instruction. On output the shift register is filled from the transfer register and will be empty when \( n \) values have been moved to the \( p \) pins, where \( n \times p \) is the transfer width.

The port operates as follows:

- **HANDSHAKEN**: A handshaken transfer only shifts data from the pins to the shift register on input when the shift register is not full; on output it only shifts data from the shift register to the pins when the shift register is not empty. On input, the shift register will become full if the processor does not input data to empty the transfer register; when the processor inputs the data, the transfer register is filled from the shift register and the shift register will start to be re-filled from the pins. On output, the shift register will become empty if the processor does fill the transfer register; when the processor outputs data to fill the transfer register, the shift register will be filled from the transfer register and the shift register will then start to be emptied to the pins.

- **STROBED SLAVE Input**: Data is shifted into the shift register from the pins whenever the ready input is asserted. Provided that the transfer register is empty, when the shift register is full the transfer register is filled from the shift register. When the processor executes an input instruction to take data from the transfer register, the transfer register is emptied.

  If the processor does not take the data from the transfer register by the time the shift register is next full, data will continue to be shifted into the shift register and only the most recent values will be kept; as soon as an input instruction empties the transfer register the transfer register will be filled from the shift register.

- **STROBED SLAVE Output**: Data is shifted out to the pins whenever the ready input is asserted. Provided that the transfer register is full, when the shift register is empty, it is filled from the transfer register. When the processor executes an output instruction it fills the transfer register.

  If the processor has not filled the transfer register by the time the shift register is next empty, the data is held on the pins. As soon as the processor executes and output instruction it fills the transfer register; the shift register is then filled from the transfer register and the it will start to be emptied to the pins.
• **STROBED MASTER**: The transfer operates in the same way as a handshaken transfer in which the ready input is always asserted.

The SETPT instruction can be used to delay the movement of data between the shift register and the transfer register until the current value of the port timer matches the time specified.

Note that this can be used to provide synchronisation with a stream of data in a BUFFERED port in NOREADY mode, because exactly one item will be shifted to or from the pins in each clock cycle.

If the port is outputting and the transfer register is full the SETPT instruction will pause until it is empty. This ensures that the port time is not changed until the pending output has completed.

The port condition can be used to locate the first item of data on the pins that matches a condition. If the condition is different from NONE, data will be held in the shift register until the data meets the condition; the data is then moved to the transfer register, the timestamp is set and the port changes the condition to NONE so that data can continue to fill the shift register in the normal way. Only the top port-width bits of the shift register are used for comparison when the condition is checked.
15.11 Partial Transfers

Buffered transfers permit data of less than the transfer width to be moved between the shift register and the transfer register. The length of the items in a buffered transfer can be set by a SETPSC instruction, which sets the port shift register count. On input, this will cause the shift register contents to be moved to the transfer register when the specified amount of data has been shifted in; on output it will cause only the specified amount of data to be shifted out before the shift register is ready to be re-loaded. This is useful for handling the first and last items in a long transfer.

SETPSC \( \text{shiftcount}_p \leftarrow s \) set port shift register count

A buffered input can be terminated by executing an ENDIN instruction which returns the number of items buffered in the port (which will include the shift register and transfer register contents) and also sets the port shift register count to the amount of data remaining in the shift register, enabling a following input to complete.

ENDIN \( d \leftarrow \text{buffercount}_p \) end input

To optimise the transfer of partwords two further instructions are provided:

OUTPW \( \text{shiftcount}_p \leftarrow q; \quad \text{output part word} \)
\( p \leftarrow s \)

OUTPWI \( \text{shiftcount}_p \leftarrow \text{bitp}; \quad \text{output part word} \)
\( p \leftarrow s \)

INPW \( \text{shiftcount}_p \leftarrow \text{bitp}; \quad \text{input part word} \)
\( p \leftarrow d \)

These encode their immediate operand in the same way as the shift instructions.

15.12 Changing Direction

A SYNCHRONISED port can change from input to output, or from output to input. The direction changes at the start of the next setup period. For a transfer initiated by a SETPT instruction, the direction will be input unless an output is executed before the time specified by the SETPT instruction.

A BUFFERED port can change direction only after it has completed a transfer. This is done by stopping and re-starting the port using SETC \( p \), STOP and SETC \( p \), START instructions.

16 Events, Interrupts and Exceptions

Events and interrupts allow timers, ports and channel ends to automatically transfer control to a pre-defined event handler. The resources generate events by default and must be reconfigured using a SETC instruction in order to generate interrupts. The ability of a thread to accept events or interrupts is controlled by information
held in the thread status register \((s_r)\), and may be explicitly controlled using SETSR and CLRISR instructions with appropriate operands.

\[
\begin{align*}
\text{SETSR} & : s_r \leftarrow s_r \lor u_{16} & \text{set thread state} \\
\text{CLRISR} & : s_r \leftarrow s_r \land \neg u_{16} & \text{clear thread state} \\
\text{GETSR} & : r_{11} \leftarrow s_r \land u_{16} & \text{get thread state}
\end{align*}
\]

The operand of these instructions should be one (or more) of

- EEBLE enable events
- IEBLE enable interrupts
- INENB determine if thread is enabling events
- ININT determine if thread is in interrupt mode
- HIPRI set thread to high priority mode
- FAST set thread to fast mode
- KEDI set thread to switch to dual issue on kernel entry

### 16.1 Events

A thread normally enables one or more events and then waits for one of them to occur. Hence, on an event all the thread’s state is valid, allowing the thread to respond rapidly to the event. The thread can perform input and output operations using the port, channel or timer which gave rise to an event whilst leaving some or all of the event information unchanged. This allows the thread to complete handling an event and immediately wait for another similar event.

Timers, ports and channel ends all support events, the only difference being the ready conditions used to trigger the event. The program location of the event handler must be set prior to enabling the event using the SETV instruction. The SETEV instruction can be used to set an environment for the event handler; this will often be a stack address containing data used by the handler. Timers and ports have conditions which determine when they will generate an event; these are set using the SETC and SETD instructions. Channel ends are considered ready as soon as they contain enough data.

Event generation by a specific port, timer or channel can be enabled using an event enable unconditional (EEU) instruction and disabled using an event disable unconditional (EDU) instruction. The event enable true (EET) instruction enables the event if its condition operand is true and disables it otherwise; conversely the event enable false (EEF) instruction enables the event if its condition operand is false, and disables it otherwise. These instructions are used to optimise the implementation of guarded inputs.
Having enabled events on one or more resources, a thread can use a WAITEU, WAITET or WAITEF instruction to wait for at least one event. The WAITEU instruction waits unconditionally; the WAITET instruction waits only if its condition operand is true, and the WAITEF waits only if its condition operand is false.

```plaintext
WAITEU  eeble (tid) ← true  event wait
```

This may result in an event taking place immediately with control being transferred to the event handler specified by the corresponding event vector with events disabled by clearing the thread’s `eeble` flag. Alternatively the thread may be paused until an event takes place with the `eeble` flag enabled; in this case the `eeble` flag will be cleared when the event takes place, and the thread resumes execution.

```plaintext
ed ← ev(res);
pc ← v(res);
sr[bit inenb] ← false;
sr[bit eeb] ← false;
sr[bit waiting] ← false
```

Note that the environment vector is transferred to the event data register, from where it can be accessed by the GETED instruction. This allows it to be used to access data associated with the event, or simply to enable several events to share the same event vector.

To optimise the responsiveness of a thread to high priority resources the SETSR EEBLE instruction can be used to enable events before starting to enable the ports, channels and timers. This may cause an event to be handled immediately, or as soon as it is enabled. An enabling sequence of this kind can be followed either by a WAITEU instruction to wait for one of the events, or it can simply be followed by a CLRSR EEBLE to continue execution when no event takes place. The WAITET and WAITEF instructions can also be used in conjunction with a CLRSR EEBLE to conditionally wait or continue depending on a guarding condition. The WAITET and
WAITEF instructions can also be used to optimise the common case of repeatedly handling events from multiple sources until a terminating condition occurs.

All of the events which have been enabled by a thread can be disabled using a single CLRE instruction. This disables event generation in all of the ports, channels or timers which have had events enabled by the thread. The CLRE instruction also clears the thread’s eeble flag.

$$\text{CLRE}\quad eebletid \leftarrow false;$$
$$\text{disable all events}$$
$$\inembtid \leftarrow false;$$
$$\text{for thread events}$$
$$\text{forall res}$$
$$\text{if (thread}_{res} = tid \land \text{event}_{res}) \text{ then enb}_{res} \leftarrow false$$

Where enabling sequences include calls to input subroutines, the SETSR INENB instruction can be used to record that the processor is in an enabling sequence; the subroutine body can use GETSR INENB to branch to its enabling code (instead of its normal inputting code). INENB is cleared whenever an event occurs, or by the CLRE instruction.

### 16.2 Interrupts

In contrast to events, interrupts can occur at any point during program execution, and so the current pc and sr (and potentially also some or all of the other registers) must be saved prior to execution of the interrupt handler. Interrupts are taken between instructions, which means that in an interrupt handler the previous instruction will have been completed, and the next instruction is yet to be executed on return from the interrupt. This is done using the spc and ssr registers. Any interrupt and exception causes the pc and sr registers to be saved into spc and ssr, and the status register to be modified to indicate that the processor is running in kernel mode:

**kernelentry**

$$\text{ssr} \leftarrow \text{sr};$$
$$\text{sed} \leftarrow \text{ed};$$
$$\text{sr}[\text{bit di}] \leftarrow \text{sr}[\text{bit kedi}];$$
$$\text{sr}[\text{bit eeble}] \leftarrow false;$$
$$\text{sr}[\text{bit iebile}] \leftarrow false;$$

On an interrupt generated by resource r the following occurs automatically:

**interrupt**

$$\text{spc} \leftarrow \text{pc}$$
$$\text{kernelentry}$$; “kernelentry” is defined above
$$\text{ed} \leftarrow \text{ev}_{res}$$
$$\text{pc} \leftarrow v_{res};$$
$$\text{sr}[\text{bit inint}] \leftarrow true;$$
$$\text{sr}[\text{bit waiting}] \leftarrow false;$$

On kernel entry the DI bit is saved in the ssr register, whereupon DI is set according to the KEDI (dual-issue-in-kernel) bit in the status register. This enables exception
handlers to be written in either SI or DI code as required. When in kernel mode, the kernel can switch between SI and DI mode as usual using DUAL/ENTSP. On return from the kernel call, KRET, the DI bit is restored from ssr.
When the handler has completed, execution of the interrupted thread can be performed by a KRET instruction, this restores the DI bit from \( spc \).

\[
\text{KRET} \quad pc \leftarrow spc \land -1; \quad \text{return from interrupt}
\]

\[
sr \leftarrow ssr
\]

\[
ed \leftarrow sed
\]

### 16.3 Exceptions

Exceptions which occur when an error is detected during instruction execution are treated in the same way as interrupts except that they transfer control to a location defined relative to the thread’s kernel entry point \( kep \) register.

\[
\text{except} \quad spc \leftarrow p_{\text{old}}; \quad \text{any exception}
\]

\[
\text{kernelentry}; \quad \text{defined on page 40}
\]

\[
 pc \leftarrow kep;
\]

\[
et \leftarrow \text{exceptiontype};
\]

\[
ed \leftarrow \text{exceptiondata};
\]

The exception handler resides on the address stored in \( kep \). The handler can run in dual or single issue mode, depending on the \( kedi \) bit in the status register. Exception types are listed below:

<table>
<thead>
<tr>
<th>Exception</th>
<th>( et )</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>ET_LINK_ERROR</td>
<td>1</td>
<td>Incorrect use of channel</td>
</tr>
<tr>
<td>ET_ILLEGAL_PC</td>
<td>2</td>
<td>Unaligned program counter</td>
</tr>
<tr>
<td>ET_ILLEGAL_INSTRUCTION</td>
<td>3</td>
<td>Illegal opcode</td>
</tr>
<tr>
<td>ET_ILLEGAL_RESOURCE</td>
<td>4</td>
<td>Illegal use of resource</td>
</tr>
<tr>
<td>ET_LOAD_STORE</td>
<td>5</td>
<td>Unaligned memory access</td>
</tr>
<tr>
<td>ET_ILLEGAL_PS</td>
<td>6</td>
<td>Undefined PS register</td>
</tr>
<tr>
<td>ET_ARITHMETIC</td>
<td>7</td>
<td>Arithmetic error</td>
</tr>
<tr>
<td>ET_ECALL</td>
<td>8</td>
<td>Assertion failed</td>
</tr>
<tr>
<td>ET_RESOURCE_DEP</td>
<td>9</td>
<td>Illegal resource use</td>
</tr>
<tr>
<td>ET_KCALL</td>
<td>15</td>
<td>KCALL executed</td>
</tr>
</tbody>
</table>

When in dual issue mode, an exception in one lane will abort any instruction in the other lane. If two instructions would both cause an exception, then only one exception is taken. The \( et \) register will hold the data as specified above, but the lane in which the exception occurred is encoded in bit 4 of \( et \). If the thread is in dual-issue mode, and two instructions were issued, and the exception occurred in the resource lane, then this bit is set to 1. In all other cases bit 4 of \( et \) will be set to 0.

A program can force an exception as a result of a software detected error condition using ECALLT, ECALLF, or ELATE:
ECALLT  if \( e \) then  
    except(ET_ECALL, \( e \));  
    "except" is defined on page 42

ECALLF  if \( \neg e \) then  
    except(ET_ECALL, \( e \));  
    "except" is defined on page 42

ELATE  if \( s \) after \( t_{current} \) then  
    except(ET_ECALL, \( s \));  
    "except" is defined on page 42

These have the same effect as hardware detected exceptions, transferring control to the same location and indicating that an error has occurred in the exception type (\( et \)) register. If in dual issue mode, any instruction in the other lane will be aborted on taking an exception.

A program can explicitly cause entry to a handler using one of the kernel call instructions. These have a similar effect to exceptions, except that they transfer control to a location defined relative to the thread’s \( kep \) register.

\[
\text{KCALLI} \quad \text{kernelentry}; \quad \text{defined on page 40} \\
\text{spc} \leftarrow \text{pc} \\
\text{et} \leftarrow \text{ET_KCALL} \\
\text{ed} \leftarrow u6; \\
\text{pc} \leftarrow kep + 64; \\
\]

\[
\text{KCALL} \quad \text{kernelentry}; \quad \text{defined on page 40} \\
\text{spc} \leftarrow \text{pc} \\
\text{et} \leftarrow \text{ET_KCALL} \\
\text{ed} \leftarrow s; \\
\text{pc} \leftarrow kep + 64; \\
\]

In dual issue mode KCALL will complete as normal; it is safe to dual-issue a KCALL instruction with any other instruction. If the instruction in the other lane causes an exception, the thread will continue with the exception and abort the KCALL.

The \( spc, ssr, et \) and \( sed \) registers can be saved and restored directly to the stack.

\[
\text{LDSPC} \quad \text{spc} \leftarrow \text{mem}[sp + 1 \times Bpw] \quad \text{load exception pc} \\
\text{STSPC} \quad \text{mem}[sp + 1 \times Bpw] \leftarrow \text{spc} \quad \text{store exception pc} \\
\text{LDSSR} \quad \text{ssr} \leftarrow \text{mem}[sp + 2 \times Bpw] \quad \text{load exception sr} \\
\text{STSSR} \quad \text{mem}[sp + 2 \times Bpw] \leftarrow \text{ssr} \quad \text{store exception sr} \\
\text{LDSED} \quad \text{sed} \leftarrow \text{mem}[sp + 3 \times Bpw] \quad \text{load exception data} \\
\text{STSED} \quad \text{mem}[sp + 3 \times Bpw] \leftarrow \text{sed} \quad \text{store exception data} \\
\text{STET} \quad \text{mem}[sp + 4 \times Bpw] \leftarrow \text{et} \quad \text{store exception type} \\
\]

In addition, the \( et \) and \( ed \) registers can be transferred directly to a register.
GETET  r11 ← et  get exception type
GETED  r11 ← ed  get exception data

A handler can use the KENTSP instruction to save the current stack pointer into word 0 of the thread’s kernel stack (using the kernel stack pointer ksp) and change stack pointer to point at the base of the thread’s kernel stack. KRESTSP can then be used to restore the stack pointer on exit from the handler.

KENTSP n  mem[ksp] ← sp;  switch to kernel stack
            sp ← ksp − n×Bpw

KRESTSP n  ksp ← sp + n×Bpw;  switch from kernel stack
            sp ← mem[ksp]

The kep can be initialised using the SETKEP instruction; the ksp can be read using the GETKSP instructions.

SETKEP  kep ← r11  set kernel entry point
GETKSP  r11 ← ksp  get kernel stack pointer

The kernel stack pointer is initialised by the boot-ROM to point to a safe location near the last location of RAM - the last few locations are used by the JTAG debugging interface. ksp can be modified by using a sequence of SETSP followed by KRESTSP.

17 Initialisation and Debugging

The state of the processor includes additional registers to those used for the threads.

<table>
<thead>
<tr>
<th>register</th>
<th>use</th>
</tr>
</thead>
<tbody>
<tr>
<td>dspc</td>
<td>debug save pc</td>
</tr>
<tr>
<td>dssr</td>
<td>debug save sr</td>
</tr>
<tr>
<td>dssp</td>
<td>debug save sp</td>
</tr>
<tr>
<td>dtype</td>
<td>debug cause</td>
</tr>
<tr>
<td>dtid</td>
<td>thread identifier used to access thread state</td>
</tr>
<tr>
<td>dtreg</td>
<td>register identifier used to access thread state</td>
</tr>
<tr>
<td>DEBUG</td>
<td>flag that indicates that processor is in debug mode</td>
</tr>
</tbody>
</table>

All of the processor state can be accessed using the GETPS and SETPS instructions:

GETPS  d ← state[s]  get processor state
SETPS  state[d] ← s  set processor state
To access the state of a thread, first SETPS is used to set \( dtid \) and \( d\text{tre}g \) to the thread identifier and register number within the thread state. The contents of the register can then be accessed by:

\[
\text{DGETREG} \quad d \leftarrow d\text{tre}g_{dtid} \quad \text{get thread register}
\]

The debugging state is entered by executing a DCALL instruction, by an instruction that triggers a watchpoint or a breakpoint, or by an external asynchronous DEBUG event (for example caused by asserting a DEBUG pin). During debug, thread 0 executes the debug handler, all other threads are frozen. The debugging state is exited on DRET, which causes thread 0 to resume at its saved PC, and all other threads to start where they were stopped. Entry to a debug handler operates in a manner similar manner to an interrupt:

\[
\text{debugentry} \quad d\text{spc} \leftarrow pc_{t0};
\]

\[
\quad d\text{ssr} \leftarrow sr_{t0};
\]

\[
\quad pc_{t0} \leftarrow \text{debugentrypoint}
\]

\[
\quad sr_{t0}[\text{bit inint}] \leftarrow \text{true}
\]

\[
\quad sr_{t0}[\text{bit di}] \leftarrow \text{false};
\]

\[
\quad sr_{t0}[\text{bit eeble}] \leftarrow \text{false};
\]

\[
\quad sr_{t0}[\text{bit ieble}] \leftarrow \text{false}
\]

\[
\quad sr_{t0}[\text{bit waiting}] \leftarrow \text{false}
\]

\[
\quad \text{DEBUG} \leftarrow 1
\]

On an external, asynchronous, DEBUG event, the processor will always enter the debug state as follows:

\[
\text{DEBUG event} \quad \text{debugentry} \quad \quad \text{“debugentry” is defined on page 45}
\]

\[
\quad d\text{type} \leftarrow \text{debugcause}
\]

The DCALL instruction has the same effect:

\[
\text{DCALL} \quad \text{debugentry} \quad \quad \text{(defined on page 45)}
\]

\[
\quad d\text{type} \leftarrow \text{dcallcause} \quad \quad \text{debug call (breakpoint)}
\]

\[
\text{DRET} \quad pc_{t0} \leftarrow d\text{spc};
\]

\[
\quad sr_{t0} \leftarrow d\text{ssr};
\]

\[
\quad \text{DEBUG} \leftarrow 0
\]

\[
\text{DENTSP} \quad d\text{sp} \leftarrow sp;
\]

\[
\quad \text{debug save stack pointer}
\]

\[
\quad sp \leftarrow \text{ramend}
\]

\[
\text{DRESTSP} \quad sp \leftarrow d\text{sp}
\]

\[
\quad \text{debug restore stack pointer}
\]

On entering debug mode the DI bit is saved in the \( d\text{spc} \) register, and it is cleared. Debug mode is always entered in single issue mode, but the debugger can switch
to dual-issue mode if required using DUALENTSP. On return from the debugger, DI is restored from the dspc.

Watchpoints and instruction breakpoints are supported by means of SETPS and GETPS instructions. An instruction breakpoint is an address that triggers a DCALL on a PC being equal to the value in the instruction break point. A data watchpoint is a pair of addresses $l$ and $h$, and a condition that triggers a DCALL on stores or loads to specific memory addresses. If the condition is set to INRANGE, then a debug is triggered if a thread access address $x$ where $l \leq x \leq h$. If the condition is set to NOTINRANGE, then a debug is triggered if a thread access address $x$ where $x \leq l \lor x \geq h$.

- When the processor is not in debug-mode, none of the debug information is writable, except for the DEBUG registers that brings the processor into debug mode.
- When the processor is not in debug-mode, none of the debug values can be read except the PC and SR values, in order to support profiling.

### 18 Specialised Instructions

#### 18.1 Long arithmetic

The long arithmetic instructions support signed and unsigned arithmetic on multi-word values. The long subtract instruction (LSUB) enables conversion between long signed and long unsigned values by subtracting from long 0. The long multiply and long divide operate on unsigned values.

The long add instruction is intended for adding multi-word values. It has a carry-in operand and a carry-out operand. Similarly, the long subtract instruction is intended for subtracting multi-word values and has a borrow-in operand and a borrow-out operand.

**LADD**

\[
\begin{align*}
\text{l} & \leftarrow \text{l} + \text{r} + \text{c}[\text{bit } 0]; & \text{add with carry} \\
\text{e} & \leftarrow \text{carry}(\text{l} + \text{r} + \text{c}[\text{bit } 0]) \\
\end{align*}
\]

**LSUB**

\[
\begin{align*}
\text{l} & \leftarrow \text{l} - \text{r} - \text{b}[\text{bit } 0]; & \text{subtract with borrow} \\
\text{e} & \leftarrow \text{borrow}(\text{l} - \text{r} - \text{b}[\text{bit } 0]) \\
\end{align*}
\]

The long multiply instruction multiplies two of its source operands, and adds two more source operands to the result, leaving the unsigned double length result in its two destination operands. The result can always be represented within two words because the largest value that can be produced is \((B – 1) \times (B – 1) + (B – 1) + (B – 1) = B^2 – 1\) where \(B = 2^{bpw}\). The two carry-in operands allow the component results of multi-length multiplications to be formed directly without the need for extra addition steps.

**LMUL**

\[
\begin{align*}
\text{d} & \leftarrow ((\text{l} \times \text{r}) + \text{s} + \text{t})[\text{bits } 2 \times \text{bpw} - 1..\text{bpw}]; & \text{long multiply} \\
\text{e} & \leftarrow ((\text{l} \times \text{r}) + \text{s} + \text{t})[\text{bits } \text{bpw} - 1..0] \\
\end{align*}
\]
The long division instruction (LDIV) is very similar to the short unsigned division instruction, except that it returns the remainder as well as the result; it also allows the remainder from a previous step of a multi-length division to be loaded as the high part of the dividend.

\[
\text{LDIV } d \leftarrow (l : m) \div r; \quad \text{long divide unsigned}
\]

\[
e \leftarrow (l : m) \mod r
\]

An ET_ARITHMETIC exception is raised if the result cannot be represented as a single word value; this occurs when \( l \leq r \). Note that this instruction operates correctly if the most significant bit of the divisor is 1 and the initial high part of the dividend is non-zero. A (fairly) simple algorithm can be used to deal with a double length divisor. One method is to normalise the divisor and divide first by the top 32 bits; this produces a very close approximation to the result which can then be corrected.

The long extract and insert instructions perform long shift and mask operations. LEXTRACT extracts a selection of bits from two words at a given offset; a sequence of LEXTRACT instructions can be used to implement a rotate, long shift, and misaligned loads. An LSATS followed by an LEXTRACT can be used to extract a word from the result of a MACCS (see the next subsection). LINSERT performs the inverse operation of LEXTRACT and inserts a bit pattern into a double word.

\[
\text{LEXTRACT } d \leftarrow (l : r)[\text{bit } bitp + x - 1..x] \quad \text{extract word}
\]

\[
\text{LINSERT } m \leftarrow ((1 << bitp) - 1) << s \quad \text{insert word}
\]

\[
d : e \leftarrow ((d : e) \land \text{bit} \neg m) \lor \text{bit} ((x << s) \land \text{bit} m)
\]

### 18.2 Multiply accumulate

The multiply-accumulate instructions perform a double length accumulation of products of single length operands:

\[
\text{MACCU } s \leftarrow ((l \times r) + (s : t))[\text{bits } 2 \times bpw - 1..bpw]; \quad \text{long multiply}
\]

\[
t \leftarrow ((l \times r) + t)[\text{bits } bpw - 1..0]; \quad \text{acc unsigned}
\]

\[
\text{MACCS } s \leftarrow ((l \times sgn r) + (s : t))[\text{bits } 2 \times bpw - 1..bpw]; \quad \text{long multiply}
\]

\[
t \leftarrow ((l \times sgn r) + t)[\text{bits } bpw - 1..0]; \quad \text{acc signed}
\]

\[
\text{LSATS} \quad \text{if } s : t > 2^{l+bpw} - 1
\]

\[
\text{then } s : t \leftarrow 2^{l+bpw} - 1;
\]

\[
\text{elsif } s : t < -2^{l+bpw}
\]

\[
\text{then } s : t \leftarrow -2^{l+bpw};
\]

A Saturate signed instruction is needed.

\[
\text{LSATSI} \quad \text{if } s : t > 2^{bitp+bpw} - 1
\]

\[
\text{then } s : t \leftarrow 2^{bitp+bpw} - 1;
\]

\[
\text{elsif } s : t < -2^{bitp+bpw}
\]

\[
\text{then } s : t \leftarrow -2^{bitp+bpw};
\]

A Saturate signed immediate instruction is needed.
The MACCU instruction multiplies two unsigned source operands to produce a double length result which it adds to its unsigned double length accumulator operand held in two other operands. Similarly, the MACCS instruction multiplies two signed source operands to produce a double length result which it adds to its signed double length accumulator operand held in two other operands. The LSATS instruction saturates a number that is outside the range \(-2^{l+bpw}..2^{l+bpw} - 1\).

### 18.3 Cyclic redundancy check

Cyclic redundancy check is performed using:

**CRC32**

```plaintext
for step = 0 for bpw
  if (r[bit 0] = 1)
     then r ← (s[bit step] : r[bits (bpw - 1) ... 1]) ⊕ p
     else r ← (s[bit step] : r[bits (bpw - 1) ... 1])
```

**CRC8**

```plaintext
for step = 0 for 8
  if (r[bit 0] = 1)
     then r ← (s[bit step] : r[bits 31 ... 1]) ⊕ p
     else r ← (s[bit step] : r[bits 31 ... 1]);
```

**CRCN**

```plaintext
if n > 32 then cnt ← 32
else cnt ← n
for step = 0 for cnt
  if (r[bit 0] = 1)
     then r ← (s[bit step] : r[bits 31 ... 1]) ⊕ p
     else r ← (s[bit step] : r[bits 31 ... 1]);
```

The CRC8 instruction operates on the least significant 8 bits of its data operand, ignoring the most significant 24 bits. It is useful when operating on a sequence of bytes, especially where these are not word-aligned in memory. The CRCN instruction operates on the least significant bytes of its data operand; the fourth operand of CRCN, t, determines the number of bytes to fold into the CRC. If t > 32 then 32 bits are be processed. This enables CRCN to be passed a bit count in a loop, and overrun in an unrolled loop.

The CRC32_INC instruction performs a CRC32 and a simultaneous increment on the second parameter.

```plaintext
for step = 0 for bpw
  if (r[bit 0] = 1)
     then r ← (s[bit step] : r[bits (bpw - 1) ... 1]) ⊕ p
     else r ← (s[bit step] : r[bits (bpw - 1) ... 1])
  a ← a + bitp
```

The MACCU instruction multiplies two unsigned source operands to produce a double length result which it adds to its unsigned double length accumulator operand held in two other operands. Similarly, the MACCS instruction multiplies two signed source operands to produce a double length result which it adds to its signed double length accumulator operand held in two other operands. The LSATS instruction saturates a number that is outside the range \(-2^{l+bpw}..2^{l+bpw} - 1\).
19 XCore XS2 Instructions

This section presents the instructions in alphabetical order. For each instruction we present a short textual description, followed by the assembly syntax, its meaning in a more formal notation, its encoding(s) and potential exceptions that can be raised by this exception.

The processor operates on words - registers are one-word wide, data can be transferred to ports and channels in words, and most memory operations operate on words. A word is $bpw$ bits long, or $Bpw$ bytes long.

In the description we use the following notation to describe operands and constants:

- $b$: denotes a bit-pattern - one of $bpw$, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, and 32; these are encoded using numbers 0...11.
- $c$: register used as a conditional.
- $d, e$: register used as a destination.
- $r$: register used as a resource identifier.
- $s$: register used as a source.
- $t$: register used as a thread identifier.
- $us$: a small unsigned constant in the range 0...11
- $ux$: an unsigned constant in the range 0...(2$^x$ – 1)
- $v, w, x, y$: registers used for two or more sources.

All mathematical operators are assumed to work on Integers ($\mathbb{Z}$) and, unless otherwise stated, bit patterns found in registers are interpreted unsigned. Signed numbers are represented using two's complement, and if an operand is interpreted as a signed number, this is denoted by a subscript $\text{signed}$. In addition to the standard numerical operators we assume the following bitwise operators:

- $\lor_{\text{bit}}$: Bitwise or.
- $\land_{\text{bit}}$: Bitwise and.
- $\oplus_{\text{bit}}$: Bitwise xor.
- $\neg_{\text{bit}}$: Bitwise complement.

Square brackets are used for two purposes. When preceded with the word $\text{mem}$ square brackets address a memory location. Otherwise, they indicate that one or more bits are sliced out of a bit pattern. Bits can be spliced together using a “,:” operator. The bit pattern $x : y$ is a pattern where $x$ are the higher order bits and $y$ are the lower order bits.

The notation $\text{mem}[x]$ represents word-based access to memory, and the address $x$ must be word-aligned (that is, the address must be a multiple of $Bpw$). Instructions that read or write data to memory that is not a word in size (such as a byte or a 16-bit value) explicitly specify which bits in memory are accessed.
The instruction encoding specifies the *opcode* bits of the encoding - the way that the operands are encoded is specified by the corresponding page in the chapter on instruction formats (if you access this document electronically there should be a hyperlink). Each operand in the instruction chapter maps positionally on an operand in the format chapter.
ADD

Integer unsigned add

Adds two unsigned integers together. There is no check for overflow. Where it occurs, overflow is ignored.

To add with carry the LADD instruction should be used instead.

The instruction has three operands:

- \( op1 \)  \( d \)  Operand register, one of \( r0...r11 \)
- \( op2 \)  \( x \)  Operand register, one of \( r0...r11 \)
- \( op3 \)  \( y \)  Operand register, one of \( r0...r11 \)

Mnemonic and operands:

\[
\text{ADD} \quad d, x, y
\]

Operation:

\[
d \leftarrow (x + y) \mod 2^{bpw}
\]

Encoding:

\[
3r \quad 000010 \ldots \ldots \ldots \quad \text{M+R}
\]
ADDI

Integer unsigned add immediate

Adds two unsigned integers together. There is no check for overflow. Where it occurs, overflow is ignored.

To add with carry the LADD instruction should be used instead.

The instruction has three operands:

- \( op1 \ d \) Operand register, one of \( r0 \ldots r11 \)
- \( op2 \ x \) Operand register, one of \( r0 \ldots r11 \)
- \( op3 \ u_s \) An integer in the range 0...11

Mnemonic and operands:

\[
\text{ADDI} \quad d, x, u_s
\]

Operation:

\[
d \leftarrow (x + u_s) \mod 2^{bpw}
\]

Encoding:

\[
\text{2rus} \quad 0 0 1 0 0 \ldots \ldots \ldots \ldots \ldots \ldots \quad \text{M+R}
\]
AND

Bitwise and

Produces the bitwise AND of two words.

The instruction has three operands:

\[ \begin{align*}
\text{op}1 & \quad d \quad \text{Operand register, one of } r0...r11 \\
\text{op}2 & \quad x \quad \text{Operand register, one of } r0...r11 \\
\text{op}3 & \quad y \quad \text{Operand register, one of } r0...r11
\end{align*} \]

Mnemonic and operands:

\[ \text{AND} \quad d, x, y \]

Operation:

\[ d \leftarrow x \land_y \]

Encoding:

\[ \begin{array}{c|c}
3r & 00111\ldots\ldots \\
\hline
\text{M+R} & \\
\end{array} \]
ANDNOT

ANDNOT clears bits in a word. Given the bits set a bit pattern \( s \), ANDNOT clears the equivalent bits in the destination operand \( d \). ANDNOT is a two operand instruction where the first operand acts as both source and destination.

ANDNOT can be used to efficiently operate on bit patterns that span a non-integral number of bytes.

See MKMSK for how to build masks efficiently.

The instruction has two operands:

\[
\begin{align*}
\text{op}1 & \quad d \quad \text{Operand register, one of } r0 \ldots r11 \\
\text{op}2 & \quad s \quad \text{Operand register, one of } r0 \ldots r11
\end{align*}
\]

Mnemonic and operands:

\[
\text{ANDNOT} \quad d, s
\]

Operation:

\[
d \leftarrow s \land \neg \text{bit } s
\]

Encoding:

\[
2r \quad 00101\ldots\ldots\ldots10\ldots\ldots1 \quad \text{M+R}
\]
ASHR

Arithmetic shift right

Right shifts a signed integer and performs sign extension. The shift distance ($y$) is an unsigned integer. If the shift distance is larger than the size of a word, the result will only be the sign extension.

If sign extension is not required, the SHR instruction should be used instead. Note that ASHR is not the same as a DIVS by $2^y$ because ASHR rounds towards minus infinity, whereas DIVS rounds towards zero.

The instruction has three operands:

$op1 \quad d \quad$ Operand register, one of $r0...r11$

$op2 \quad x \quad$ Operand register, one of $r0...r11$

$op3 \quad y \quad$ Operand register, one of $r0...r11$

Mnemonic and operands:

```
ASHR \quad d, x, y
```

Operation:

$$d \leftarrow \begin{cases} 
0 < y < bpw, & x[bpw - 1] : \ldots : x[bpw - 1] : x[bpw - 1...y] \\
y = 0, & x \\
y \geq bpw, & x[bpw - 1] : \ldots : x[bpw - 1]
\end{cases}$$

Encoding:

```
1 1 1 1 | · · · | · · · | 0 0 0 1 0 1 1 1 1 1 0 1 0
```

M&R
**ASHRI**

**Arithmetic shift right immediate**

Right shifts a signed integer and performs sign extension. The shift distance \((bitp)\) is an unsigned integer. If the shift distance is larger than the size of a word, the result will only be the sign extension.

If sign extension is not required, the **SHR** instruction should be used instead. Note that ASHR is not the same as a **DIVS** by \(2^{bitp}\) because ASHR rounds towards minus infinity, whereas DIVS rounds towards zero.

The instruction has three operands:

\[
\begin{align*}
    op1 & \quad d & \quad \text{Operand register, one of } r0 \ldots r11 \\
    op2 & \quad x & \quad \text{Operand register, one of } r0 \ldots r11 \\
    op3 & \quad bitp & \quad \text{A bit position; one of } bpw, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32
\end{align*}
\]

Mnemonic and operands:

\[
\text{ASHRI} \quad d, x, bitp
\]

Operation:

\[
d \left\{ \begin{array}{l}
0 < bitp < bpw, \\
bitp = 0, \\
bitp \geq bpw,
\end{array} \right. \rightarrow \begin{array}{c}
x[bpw - 1] : \ldots : x[bpw - 1] : x[bpw - 1 \ldots bitp] \\
x \\
x[bpw - 1] : \ldots : x[bpw - 1]
\end{array}
\]

Encoding:

\[
\begin{array}{l}
\text{I2rus} \quad \begin{array}{cccccccccccc}
1 & 1 & 1 & 1 & \cdots & \cdots & \cdots & \cdots & \cdots \\
1 & 0 & 0 & 1 & 0 & 1 & 1 & 1 & 1 & 1 & 0 & 1 & 1 & 0 & 0 & \ M&R
\end{array}
\end{array}
\]
**BAU**  
Branch absolute unconditional register

Branches to the address given in a general purpose register. The register value must be even, and should point to a valid memory location.

The instruction has one operand:

\[ op1 \quad s \quad \text{Operand register, one of } r0...r11 \]

Mnemonic and operands:

\[
\begin{align*}
\text{BAU} & \quad s \\
\text{Operation:} & \quad pc \leftarrow s \\
\text{Encoding:} & \quad 1r \quad 00100111111111 \ldots \ M
\end{align*}
\]

Conditions that raise an exception:

**ET_ILLEGAL_PC**  
The address specified was not 16-bit aligned or did not point to a memory location.
**BITREV**

Reverses the bits in a word; the most significant bit of the source operand will be produced in the least significant bit of the destination operand, the value of the least significant bit of the source operand will be produced in the most significant bit of the destination operand.

This instruction can be used in conjunction with **BYTESREV** in order to translate between different ordering conventions such as big-endian and little-endian.

The instruction has two operands:

\[
\begin{align*}
\text{op1} & \quad d & \text{Operand register, one of } r0...r11 \\
\text{op2} & \quad s & \text{Operand register, one of } r0...r11
\end{align*}
\]

Mnemonic and operands:

\[
\text{BITREV} \quad d, s
\]

Operation:

\[
d[bpw-1...0] \leftarrow s[0] : s[1] : s[2] : ... : s[bpw-1]
\]

Encoding:

\[
2r \quad 0 \quad 0 \quad 0 \quad 1 \quad 0 \quad \cdots \cdots \quad 0 \quad \cdots \cdots \quad \text{M+R}
\]
BlA  
Branch and link absolute via register

This instruction implements a procedure call to an absolute address. The program counter is saved in the link-register (lr) and the program counter is set to the given address. This address must be even and point to a valid memory address, otherwise an exception is raised. On execution of BlA, the processor will read the target instruction so that the invoked procedure will start without delay.

On entry to the procedure, the Link Register can be saved on the stack using the ENTSP instruction. RETSP performs the opposite of this instruction, returning from a procedure call.

The instruction has one operand:

\[ op1 \quad s \]

Operand register, one of r0... r11

Mnemonic and operands:

\[
\text{BLA} \quad s
\]

Operation:

\[
\begin{align*}
& lr \leftarrow pc \\
& pc \leftarrow s
\end{align*}
\]

Encoding:

\[
1r \quad 0 0 1 0 0 | 1 1 1 1 | 1 | 0 | \cdots \]

M

Conditions that raise an exception:

ET_ILLEGAL_PC  The address specified was not 16-bit aligned or did not point to a memory location.
**BLACP**

Branch and link absolute via constant pool

This instruction implements a call to a procedure via the constant pool lookup table. The program counter is saved in the link-register \((lr)\). The program counter is loaded from the constant pool table. The constant pool register \((cp)\) is used as the base address for the table. An offset \((u_{20})\) specifies which word in the table to use. Because the instruction requires access to memory, the execution of the target instruction may be delayed by one instruction in order to fetch the target instruction.

On entry to the procedure, the Link Register can be saved on the stack using the **ENTSP** instruction. **RETP** performs the opposite of this instruction, returning from a procedure call.

The instruction has one operand:

\[
op_1 \ u_{20} \quad \text{A 20-bit immediate in the range } 0...1048575. \text{ If } u_{20} < 1024, \text{ the instruction requires no prefix}
\]

Mnemonic and operands:

\[
\text{BLACP} \quad u_{20}
\]

Operation:

\[
\begin{align*}
lr & \leftarrow pc \\
pc & \leftarrow \text{mem}[cp + u_{20} \times Bpw]
\end{align*}
\]

Encoding:

\[
\begin{array}{cccccccccccc}
\text{u10} & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & . . . . . . . . . & \text{M} \\
\text{lu10} & 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & . . . . . . . . . & \text{M&R}
\end{array}
\]

Conditions that raise an exception:

- **ET_ILLEGAL_PC** Loaded value was not 16-bit aligned or did not point to a memory location (trapped during next cycle).
- **ET_LOAD_STORE** Register \(cp\) points to an unaligned address, or the indexed address does not point to a valid memory address.
BLAT

**Branch and link absolute via table**

This instruction implements a call to a procedure via a lookup table. The program counter is saved in the link-register (lr). The program counter is loaded from the lookup table. The lookup table base address is taken from r11. An offset (u16) specifies which word in the table to use. Because the instruction requires access to memory, the execution of the target instruction may be delayed by one instruction in order to fetch the target instruction.

On entry to the procedure, the Link Register can be saved on the stack using the ENTSP instruction. RETSP performs the opposite of this instruction, returning from a procedure call.

The instruction has one operand:

\[
\text{op1} \quad u_{16}
\]

A 16-bit immediate in the range 0...65535. If \(u_{16} < 64\), the instruction requires no prefix.

Mnemonic and operands:

\[
\text{BLAT} \quad u_{16}
\]

Operation:

\[
\begin{align*}
\text{lr} & \leftarrow \text{pc} \\
\text{pc} & \leftarrow \text{mem}[r11 + u_{16} \times Bpw]
\end{align*}
\]

Encoding:

\[
\begin{array}{cccccccccccc}
\text{u6} & 0 & 1 & 1 & 1 & 0 & 1 & 1 & 0 & 1 & \cdots & \cdots & \cdots & \cdots \\
\text{M}
\end{array}
\]

or prefixed for long immediates:

\[
\begin{array}{cccccccccccc}
\text{lu6} & 1 & 1 & 1 & 0 & 0 & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots \\
\text{M&R}
\end{array}
\]

Conditions that raise an exception:

- **ET_ILLEGAL_PC** Loaded value was not 16-bit aligned or did not point to a memory location (trapped during the next cycle).
- **ET_LOAD_STORE** Register r11 points to an unaligned address, or the indexed address does not point to a valid memory address.
BLRB  

Branch and link relative backwards

This instruction performs a call to a procedure: the address of the next instruction is saved in the link-register (lr) An unsigned offset is subtracted from the program counter. This implements a relative jump.

On entry to the procedure, the Link Register can be saved on the stack using the ENTSP instruction. RETSP performs the opposite of this instruction, returning from a procedure call. The counterpart forward call is called BLRF.

The instruction has one operand:

\[ op1 \quad u_{20} \]

A 20-bit immediate in the range 0...1048575. If \( u_{20} < 1024 \), the instruction requires no prefix

Mnemonic and operands:

\[
\text{BLRB} \quad u_{20}
\]

Operation:

\[
lr \leftarrow pc \\
pc \leftarrow pc - u_{20} \times iw
\]

Encoding:

\[
u10 1101011 \cdot \cdot \cdot \cdot \cdot \cdot \cdot \cdot M
\]

or prefixed for long immediates:

\[
u10 1111011 \cdot \cdot \cdot \cdot \cdot \cdot \cdot\]

\[
lu10 1101011 \cdot \cdot \cdot \cdot \cdot \cdot \cdot \]

M&R

Conditions that raise an exception:

\[
\text{ET_ILLEGAL_PC} \quad \text{The new PC is not pointing to a valid memory location.}
\]
**BLRF**

Branch and link relative forwards

This instruction performs a call to a procedure: the address of the next instruction is saved in the link-register (lr) An unsigned offset is added to the program counter. This implements a relative jump.

On entry to the procedure, the Link Register can be saved on the stack using the ENTSP instruction. RETSP performs the opposite of this instruction, returning from a procedure call. The counterpart backward call is called BLRB.

The instruction has one operand:

\[ op1 \ u_{20} \]

A 20-bit immediate in the range 0...1048575. If \( u_{20} < 1024 \), the instruction requires no prefix

Mnemonic and operands:

BLRF \( \ u_{20} \)

Operation:

\[
\begin{align*}
lr & \leftarrow pc \\
pc & \leftarrow pc + u_{20} \times iw
\end{align*}
\]

Encoding:

\[
\begin{array}{c}
\text{u10} \quad 1 \ 1 \ 0 \ 1 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \\
\text{M}
\end{array}
\]

or prefixed for long immediates:

\[
\begin{array}{c}
\text{lu10} \quad 1 \ 1 \ 1 \ 1 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \\
\text{M&R}
\end{array}
\]

Conditions that raise an exception:

**ET_ILLEGAL_PC** The new PC is not pointing to a valid memory location.
BRBF

Branch relative backwards false

This instruction implements a conditional relative jump backwards. A condition \(c\) is tested whether it represents 0 (false) and if this is the case an offset \(u_{16}\) is subtracted from the program counter.

This instruction is part of a group of four instructions that conditionally jump forwards or backwards on true or false conditions: BRBF, BRBT, BRFF, and BRFT.

The instruction has two operands:

\[
\begin{align*}
\text{op1} & \quad c \quad \text{Operand register, one of } r0 \ldots r11 \\
\text{op2} & \quad u_{16} \quad \text{A 16-bit immediate in the range 0...65535. If } u_{16} < 64, \\
& \quad \text{the instruction requires no prefix}
\end{align*}
\]

Mnemonic and operands:

\[
\text{BRBF} \quad c, u_{16}
\]

Operation:

\[
\text{if } c = 0 \text{ then } pc \leftarrow pc - u_{16} \times iw
\]

Encoding:

\[
\begin{array}{c}
\text{ru6} \\
01111100 \ldots \ldots \ldots \ldots \ldots \ldots \\
\text{M}
\end{array}
\]

or prefixed for long immediates:

\[
\begin{array}{c}
\text{lru6} \\
111100 \ldots \ldots \ldots \ldots \ldots \\
\text{M&R}
\end{array}
\]

Conditions that raise an exception:

ET_ILLEGAL_PC The new PC is not pointing to a valid memory location.
**BRBT**

Branch relative backwards true

This instruction implements a conditional relative jump backwards. A condition \( c \) is tested whether it is not 0 (true) and if this is the case an offset \( u_{16} \) is subtracted from the program counter.

This instruction is part of a group of four instructions that conditionally jump forwards or backwards on true or false conditions: BRBF, BRBT, BRFF, and BRFT.

The instruction has two operands:

\[
\begin{align*}
op_1 & \quad c \\
op_2 & \quad u_{16}
\end{align*}
\]

- **\( op_1 \)** \( c \)Operand register, one of \( r0...r11 \)
- **\( op_2 \)** \( u_{16} \)A 16-bit immediate in the range 0...65535. If \( u_{16} < 64 \), the instruction requires no prefix

Mnemonic and operands:

\[
\text{BRBT} \quad \quad c, u_{16}
\]

Operation:

\[
\text{if } c \neq 0 \text{ then } pc \leftarrow pc - u_{16} \times iw
\]

Encoding:

\[
\begin{array}{c}
\text{ru6} & 011101\ldots\ldots \\
M
\end{array}
\]

or prefixed for long immediates:

\[
\begin{array}{c}
\text{lru6} & 111010\ldots\ldots \\
\text{M&R}
\end{array}
\]

Conditions that raise an exception:

**ET_ILLEGAL_PC** The new PC is not pointing to a valid memory location.
BRBU  

Branch relative backwards unconditional

This instruction implements a relative jump backwards. The operand specifies the offset that should be subtracted from the program counter.

The counterpart forward relative jump is BRFU.

The instruction has one operand:

\[ op1 \ u_{16} \]

A 16-bit immediate in the range 0...65535. If \( u_{16} \) < 64, the instruction requires no prefix.

Mnemonic and operands:

\[ \text{BRBU} \quad u_{16} \]

Operation:

\[ pc \leftarrow pc - u_{16} \times iw \]

Encoding:

\[ \text{u6} \quad 0111011100 \ldots \ldots \ldots \ldots \quad \text{M} \]

or prefixed for long immediates:

\[ \text{lu6} \quad 1110111000 \ldots \ldots \ldots \ldots \quad \text{M&R} \]

Conditions that raise an exception:

\[ \text{ET_ILLEGAL_PC} \quad \text{The new PC is not pointing to a valid memory location.} \]
BRFF

**Branch relative forward false**

This instruction implements a conditional relative jump forwards. A condition \( c \) is tested whether it represents 0 (false) and if this is the case an offset \( u_{16} \) is added to the program counter.

This instruction is part of a group of four instructions that conditionally jump forwards or backwards on true or false conditions: BRBF, BRBT, BRFF, and BRFT.

The instruction has two operands:

\[
\begin{align*}
\text{op1} & \quad c \\
\text{op2} & \quad u_{16}
\end{align*}
\]

- \( c \): Operand register, one of \( r0...r11 \)
- \( u_{16} \): A 16-bit immediate in the range 0...65535. If \( u_{16} < 64 \), the instruction requires no prefix

Mnemonic and operands:

\[
\text{BRFF} \quad c, u_{16}
\]

Operation:

\[
\text{if } c = 0 \text{ then } pc \leftarrow pc + u_{16} \times iw
\]

Encoding:

\[
\begin{align*}
\text{ru6} & \quad 0 \ 1 \ 1 \ 1 \ 1 \ 0 \ \cdot \ \cdot \ \cdot \ \cdot \ \cdot \\
\text{M} & \quad \text{ru6} \quad 0 \ 1 \ 1 \ 1 \ 1 \ 0 \ \cdot \ \cdot \ \cdot \ \cdot \ \cdot \\
\end{align*}
\]

or prefixed for long immediates:

\[
\begin{align*}
\text{lru6} & \quad 1 \ 1 \ 1 \ 1 \ 0 \ \cdot \ \cdot \ \cdot \ \cdot \ \cdot \ \cdot \\
\text{M&R} & \quad \text{lru6} \quad 0 \ 1 \ 1 \ 1 \ 1 \ 0 \ \cdot \ \cdot \ \cdot \ \cdot \ \cdot \\
\end{align*}
\]

Conditions that raise an exception:

**ET_ILLEGAL_PC**  The new PC is not pointing to a valid memory location.
BRFT

Branch relative forward true

This instruction implements a conditional relative jump forwards. A condition (c) is tested whether it is not 0 (true) and if this is the case an offset ($u_{16}$) is added to the program counter.

This instruction is part of a group of four instructions that conditionally jump forwards or backwards on true or false conditions: BRBF, BRBT, BRFF, and BRFT.

The instruction has two operands:

- $op1 \ c$ Operand register, one of $r0...r11$
- $op2 \ u_{16}$ A 16-bit immediate in the range 0...65535. If $u_{16} < 64$, the instruction requires no prefix

Mnemonic and operands:

```
BRFT         c, u_{16}
```

Operation:

if $c \neq 0$ then $pc \leftarrow pc + u_{16} \times iw$

Encoding:

```
ru6  0 1 1 1 0 | 0 . . . . . . . .
```

M

or prefixed for long immediates:

```
Iru6  1 1 1 1 0 | 0 . . . . . . . .
```

```
Iru6  0 1 1 1 0 | 0 . . . . . . . .
```

M&R

Conditions that raise an exception:

- ET_ILLEGAL_PC The new PC is not pointing to a valid memory location.
**BRFU**

Branch relative forward unconditional

This instruction implements a relative jump forwards. The operand specifies the offset that should be added to the program counter.

The counterpart backward relative jump is **BRBU**.

The instruction has one operand:

\[
\text{op} \quad u_{16} \quad \text{A 16-bit immediate in the range 0...65535. If } u_{16} < 64, \text{ the instruction requires no prefix}
\]

Mnemonic and operands:

\[
\begin{array}{c}
\text{BRFU} \\
u_{16}
\end{array}
\]

Operation:

\[
pc \leftarrow pc + u_{16} \times iw
\]

Encoding:

\[
\text{u6} \quad 0 \ 1 \ 1 \ 1 \ 0 \ 0 \ 1 \ 1 \ 0 \ 0 \ \ldots \ldots \ldots \quad \text{M}
\]

or prefixed for long immediates:

\[
\text{lu6} \quad 0 \ 1 \ 1 \ 1 \ 0 \ 0 \ 1 \ 1 \ 0 \ 0 \ \ldots \ldots \ldots \quad \text{M&R}
\]

Conditions that raise an exception:

**ET_ILLEGAL_PC** The new PC is not pointing to a valid memory location.
**BRU**  
*Branch relative unconditional register*

This instruction implements a jump using a signed offset stored in a register. Because instructions are aligned on 16-bit boundaries, the offset in the register is multiplied by 2. Negative values cause backwards jumps.

The instruction has one operand:

\[ op1 \quad s \quad \text{Operand register, one of } r0...r11 \]

Mnemonic and operands:

\[ \text{BRU} \quad s \]

Operation:

\[ pc \leftarrow pc + s_{\text{signed}} \times iw \]

Encoding:

| 1r | 0 0 1 0 1 | 1 1 1 1 | 1 1 1 1 | 1 0 | ... | M |

Conditions that raise an exception:

- **ET_ILLEGAL_PC**  The new PC is not pointing to a valid memory location.
BYTEREV

This instruction reverses the bytes of a word.

Together with the BITREV instruction this can be used to resolve requirements of different ordering conventions such as little-endian and big-endian.

The instruction has two operands:

\[ \text{op1} \quad d \quad \text{Operand register, one of } r0...r11 \]
\[ \text{op2} \quad s \quad \text{Operand register, one of } r0...r11 \]

Mnemonic and operands:

BYTEREV \quad d, s

Operation:

\[ d[bpw - 1...0] \quad \leftarrow \quad s[7...0] : s[15...8] : ... : s[bpw - 1 : bpw - 8] \]

Encoding:

2r \quad 0 0 0 0 0 0 \cdot \cdot \cdot \cdot 0 \cdot \cdot \cdot \quad M+R
**CHKCT**

Test for control token

If the next token on a channel is the specified control token, then this token is discarded from the channel. If not, the instruction raises an exception.

This instruction pauses if the channel does not have a token available to be read.

This instruction can be used together with OUTCT in order to implement robust protocols on channels; each OUTCT must have a matching CHKCT or INCT. TESTCT tests for a control token without trapping, and does not discard the control token.

The instruction has two operands:

- \( op1 \)  \( r \)  Operand register, one of \( r0...r11 \)
- \( op2 \)  \( s \)  Operand register, one of \( r0...r11 \)

Mnemonic and operands:

\[
\text{CHKCT} \quad r, s
\]

Operation:

\[
\text{if hasctoken}(r) \land (s = \text{token}(r)) \text{ then skiptoken}(r) \text{ raiseexception}
\]

Encoding:

\[
2r \quad 1 1 0 0 1 \ldots 0 \ldots R
\]

Conditions that raise an exception:

- **ET_RESOURCE_DEP**  Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE**  \( r \) is not pointing to a channel resource, or the resource is not in use.
- **ET_ILLEGAL_RESOURCE**  \( r \) contains a data token.
- **ET_ILLEGAL_RESOURCE**  \( r \) contains a control token different to \( s \).
CHKCTI

Test for control token immediate

If the next token on a channel is the specified control token, then this token is discarded from the channel. If not, the instruction raises an exception.

This instruction pauses if the channel does not have a token available to be read.

This instruction can be used together with OUTCT in order to implement robust protocols on channels; each OUTCT must have a matching CHKCT or INCT. TESTCT tests for a control token without trapping, and does not discard the control token.

The instruction has two operands:

\[ \begin{align*} op1 & \quad r \quad \text{Operand register, one of } r0...r11 \\
op2 & \quad u_s \quad \text{An integer in the range } 0...11 \end{align*} \]

Mnemonic and operands:

\[ \text{CHKCTI} \quad r, u_s \]

Operation:

\[
\text{if hasctoken}(r) \land (u_s = \text{token}(r)) \text{ then } \begin{cases} 
skiptoken(r) \\
1 \quad \text{raiseexception} 
\end{cases}
\]

Encoding:

\[ \begin{align*} rus & \quad \begin{array}{cccccccc} 1 & 1 & 0 & 0 & 1 & \cdots & \cdots & \cdots \\
& & 1 & \cdots & \cdots & \cdots & \cdots & R \end{array} \end{align*} \]

Conditions that raise an exception:

- **ET_RESOURCE_DEP** Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE** \( r \) is not pointing to a channel resource, or the resource is not in use.
- **ET_ILLEGAL_RESOURCE** \( r \) contains a data token.
- **ET_ILLEGAL_RESOURCE** \( r \) contains a control token different to \( u_s \).
CLRE  Clear all events

Clears the thread’s Event-Enable and In-Enabling flags, and disables all individual events for the thread. Any resource (port, channel, timer) that was enabled for this thread will be disabled.

The instruction has no operands.

Mnemonic and operands:

\[
\text{CLRE}
\]

Operation:

\[
sr[\text{een}] \leftarrow 0
\]
\[
sr[\text{inen}] \leftarrow 0
\]

forall \ res

\[
\text{if (thread}_\text{res} = \text{tid}) \land \text{event}_\text{res} \text{ then } \text{enb}_\text{res} \leftarrow 0
\]

Encoding:

\[
\begin{array}{cccccccc}
0 & 0 & 0 & 0 & 0 & 1 & 1 & 1 \hline
1 & 1 & 0 & 0 & 1 & 1 & 0 & 1
\end{array}
\]
CLRPT Clear the port time

Clears the timer that is used to determine when the next output on a port will happen.

The instruction has one operand:

\[ op \quad r \quad \text{Operand register, one of } r0...r11 \]

Mnemonic and operands:

\[ \text{CLRPT} \quad r \]

Operation:

\[ \text{clearporttime}(r) \]

Encoding:

\[
\begin{array}{cccccccc}
1 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 0 & \cdots & R
\end{array}
\]

Conditions that raise an exception:

- **ETRESOURCE_DEP** Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE** \( r \) is not pointing to a port resource, or the resource is not in use.
CLRSR

Clear bits in the thread’s status register ($sr$). The mask supplied specifies which bits should be cleared. CLRSR can only be used to clear the EEBLE, IEBLE, INENB, ININT and INK bits.

SETSR is used to set bits in the status register. The value of these bits are documented on the SETSR page.

The instruction has one operand:

$$op1 \ u_{16}$$

A 16-bit immediate in the range 0...65535. If $u_{16} < 64$, the instruction requires no prefix.

Mnemonic and operands:

CLRSR $u_{16}$

Operation:

$$sr \leftarrow sr \land \neg bit \ u_{16}$$

Encoding:

$$u_{6} \ \ \ 0111101100\ldots\ldots\ldots\ R$$

or prefixed for long immediates:

$$lu_{6} \ \ \ 0111101100\ldots\ldots\ldots\ M&R$$
CLZ

Count leading zeros

Counts the number of leading zero bits in its operand. If the operand is zero, then \( bpw \) is produced. If the operand starts with a '1' bit (i.e., a negative signed integer, or a large unsigned integer), then 0 is produced. This instruction can be used to efficiently normalise integers.

The instruction has two operands:

\[
\begin{align*}
op1 & \quad d & \text{Operand register, one of } r0...r11 \\
op2 & \quad s & \text{Operand register, one of } r0...r11
\end{align*}
\]

Mnemonic and operands:

\[
\text{CLZ} \quad d, s
\]

Operation:

\[
d \leftarrow \begin{cases} 
  s = 0 & \text{bpw} \\
  s[bpw - 1] = 0, & \text{bpw} - 1 - \lfloor \log_2 s \rfloor \\
  s[bpw - 1] = 1, & 0
\end{cases}
\]

Encoding:

\[
2r \quad 00001\cdot\cdot\cdot\cdot|0|\cdot\cdot\cdot \quad \text{M+R}
\]
Incorporates the CRC over 8-bits of a 32-bit word into a Cyclic Redundancy Checksum. The instruction has four operands. Similar to CRC the first operand is used both as a source to read the initial value of the checksum and a destination to leave the updated checksum, and there are operands to specify the the polynomial \((p)\) to use when computing the CRC, and the data \((e)\) to compute the CRC over. Since on completion of the instruction the part of the data that has not yet been incorporated into the CRC, the most significant 24-bits of the data are stored in a second destination register \((x)\). This enables repeated execution of CRC8 over a part-word. Executing \(Bpw\) CRC8 instructions in a row is identical to executing a single CRC instruction.

The instruction has four operands:

- \(op1\ d\)  Operand register, one of \(r0 \ldots r11\)
- \(op4\ x\)  Operand register, one of \(r0 \ldots r11\)
- \(op2\ e\)  Operand register, one of \(r0 \ldots r11\)
- \(op3\ p\)  Operand register, one of \(r0 \ldots r11\)

Mnemonic and operands:

\[
\text{CRC8} \quad d, x, e, p
\]

Operation:

\[
\begin{align*}
\text{for step} & = 0 \text{ for } 8 \\
\text{if } (r[0] = 1) \text{ then} & \\
& r \leftarrow (d[\text{step}]:r[31\ldots1]) \oplus \text{bit } p \\
\text{else} & \\
& r \leftarrow (d[\text{step}]:r[31\ldots1]) \\
d[bpw \ldots 1] & \leftarrow 0:0:0:0:0:0:0:0:e[bpw - 1 : 8]
\end{align*}
\]

Encoding:

\[
\begin{array}{cccccccccc}
1 & 1 & 1 & 1 & 1 & \cdot & \cdot & \cdot & \cdot & M&R \\
\hline
14r & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 0 & \cdot & \cdot & \cdot
\end{array}
\]
Incorporates a word into a Cyclic Redundancy Checksum. The instruction has three operands. The first operand \((d)\) is used both as a source to read the initial value of the checksum and a destination to leave the updated checksum. The other operands are the data to compute the CRC over \((x)\) and the polynomial to use when computing the CRC \((p)\).

The instruction has three operands:

- \(op_1 \ d\) Operand register, one of \(r0...r11\)
- \(op_2 \ x\) Operand register, one of \(r0...r11\)
- \(op_3 \ p\) Operand register, one of \(r0...r11\)

Mnemonic and operands:

\[
\text{CRC} \quad d, x, p
\]

Operation:

\[
\text{for step} = 0 \text{ for bpw} \\
\quad \text{if } (r[0] = 1) \text{ then} \\
\quad \quad r \leftarrow (d[step]: r[bpw - 1...1]) \oplus_b p \\
\quad \text{else} \\
\quad r \leftarrow (d[step]: r[bpw - 1...1])
\]

Encoding:

\[
\begin{array}{cccccccccccccc}
1 & 1 & 1 & 1 & . & . & . & . & . & . & . & . & . \\
\end{array}
\]

\[
\begin{array}{cccccccc}
1 & 0 & 1 & 0 & 1 & 1 & 1 & 1 & 0 & 1 & 0 & 0 \\
\end{array}
\]

M&R
**CRC32_INC**

Word CRC with address increment

Incorporates a word into a Cyclic Redundancy Checksum. The instruction has three operands. The first operand \((d)\) is used both as a source to read the initial value of the checksum and a destination to leave the updated checksum. The other operands are the data to compute the CRC over \((x)\) and the polynomial to use when computing the CRC \((p)\).

Simultaneously, the instruction increments a register with the specified value.

The instruction has five operands:

- \(op1\ d\) Operand register, one of \(r0...r11\)
- \(op4\ a\) Operand register, one of \(r0...r11\)
- \(op2\ x\) Operand register, one of \(r0...r11\)
- \(op3\ p\) Operand register, one of \(r0...r11\)
- \(op5\ bitp\) A bit position; one of \(bpw, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32\)

Mnemonic and operands:

```
CRC32_INC  d, a, x, p, bitp
```

Operation:

```
for step = 0 for bpw
  if (r[0] = 1) then
    r ← (d[step]: r[bpw − 1...1]) ⊕ bitp
  else
    r ← (d[step]: r[bpw − 1...1])
  a ← a + bitp;
```

Encoding:

```
| 1 | 1 | 1 | 1 | · · · | · · · | · · · | l4rus |
| 0 | 0 | 1 | 0 | | | x | x | x | x | x | x | x | x | x | x | x | x | x | x | M&R
```
CRCN

Incorporates the CRC over N-bits of a 32-bit word into a Cyclic Redundancy Checksum. The instruction has four operands. Similar to CRC the first operand is used both as a source to read the initial value of the checksum and a destination to leave the updated checksum, and there are operands to specify the the polynomial \((p)\) to use when computing the CRC, the data \((d)\) to compute the CRC over, and the number of bits \((n)\).

The CRCN instruction is provided to complete the checksum over messages that have a number of bytes that is not a multiple of \(Bpw\), or for messages where the start is not aligned.

The instruction has four operands:

\[
\begin{align*}
   op_1 & \quad x & \text{Operand register, one of } r0...r11 \\
   op_4 & \quad d & \text{Operand register, one of } r0...r11 \\
   op_2 & \quad p & \text{Operand register, one of } r0...r11 \\
   op_3 & \quad n & \text{Operand register, one of } r0...r11 \\
\end{align*}
\]

Mnemonic and operands:

\[
\text{CRCN} \quad x, d, p, n
\]

Operation:

\[
\begin{align*}
   & \text{for } step = 0 \text{ for } (\text{if } n < Bpw \text{ then } n \text{ else } Bpw) \\
   & \quad \text{if } (r[0] = 1) \text{ then} \\
   & \quad \quad r \leftarrow (d[step] : r[Bpw - 1...1]) \oplus p \\
   & \quad \text{else} \\
   & \quad \quad r \leftarrow (d[step] : r[Bpw - 1...1])
\end{align*}
\]

Encoding:

\[
\begin{array}{c}
\text{1 1 1 1 1 1} : \cdot \cdot \cdot \cdot \cdot \cdot \cdot \\
\text{1 1 1 1 1 1} : \text{M&R}
\end{array}
\]

\[
\begin{array}{c}
\text{1 1 1 1 1 1} : \cdot \cdot \cdot \cdot \cdot \cdot \cdot \\
\text{0 0 0 0 0 0} : \text{M&R}
\end{array}
\]
DCALL

Call a debug interrupt

Switches to debug mode, saving the current program counter and stack pointer of thread 0 in debug registers. Thread 0 is deemed to have taken an interrupt and is therefore removed from the multicycle unit and lock resources, and all of its resources are informed such that it is removed from any resources it was inputting/outputting/eventing on.

DRET returns from a debug interrupt. DENTSP and DRESTSP instructions are used to switch to and from the debug SP.

The instruction has no operands.

Mnemonic and operands:

```
DCALL
```

Operation:

```
dspc ← pc_{t0}
dssr ← sr_{t0}
pc_{t0} ← debugentry
dtype ← dcallcause
sr_{t0}[inint] ← 1
sr_{t0}[ink] ← 1
sr_{t0}[ieble] ← 0
sr_{t0}[ieble] ← 0
sr_{t0}[inject] ← 0
sr_{t0}[waiting] ← 0
dbgint[in dbg] ← 1
```

Encoding:

```
0 0 0 0 0 0 1 1 1 1 1 1 1 1 0
```

M+R
DENTSP

Save and modify stack pointer for debug

Causes thread 0 to use the Debug SP rather than the SP in debug mode. Saves the SP in debug saved stack pointer (DSSP), and loads the SP with the top word location in RAM.

**DRESTSP** is used to use the restore the original SP from the DSSP.

The instruction has no operands.

Mnemonic and operands:

```
DENTSP
```

Operation:

\[
\text{dssp} \leftarrow \text{sp} \\
\text{sp} \leftarrow \text{ramend}
\]

Encoding:

```
1 1 1 1 | 1 1 1 1 | 1 0 1 0 1 0 1 0 0 \\
0 1 0 1 1 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0
```

Conditions that raise an exception:

*ET_ILLEGAL_INSTRUCTION* not in debug mode.
DGETREG

Debug read of another thread's register

The contents of any thread's register can then be accessed for debugging purpose. To access the state of a thread, first used SETPS to set dtid and dtreg to the thread identifier and register number within the thread state.

The instruction has one operand:

\[ op_1 \quad s \]

Operand register, one of \( r0 \ldots r11 \)

Mnemonic and operands:

\[
\text{DGETREG} \quad s
\]

Operation:

\[ s \leftarrow dtreg_{dtid} \]

Encoding:

\[
1r \quad 0 \begin{array}{cccccc} 0 & 1 & 1 & 1 & 1 & 1 \end{array} \quad 1 \begin{array}{c} 0 \end{array} \cdot \cdot \cdot \quad M
\]

Conditions that raise an exception:

\text{ET_ILLEGAL_INSTRUCTION} \quad \text{not in debug mode.}
DIVS

Signed division

Produces the result of dividing two signed words, rounding the result towards zero. For example $5 \div 3$ is 1, $-5 \div 3$ is $-1$, $-5 \div -3$ is 1, and $5 \div -3$ is $-1$.

This instruction does not execute in a single cycle, and multiple threads may share the same division unit. The division may take up to $bpw$ thread-cycles.

The instruction has three operands:

- $op1 \quad d$ Operand register, one of $r0...r11$
- $op2 \quad x$ Operand register, one of $r0...r11$
- $op3 \quad y$ Operand register, one of $r0...r11$

Mnemonic and operands:

```
DIVS \quad d, x, y
```

Operation:

$$d_{\text{signed}} \leftarrow x_{\text{signed}} \div y_{\text{signed}}$$

Encoding:

```
|\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|}
\hline
1 & 1 & 1 & 1 & . & . & . & . & . & . & . & . & . & . & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 0 & 1 & 1 & 0 \\
\end{array}|
```

M&R

Conditions that raise an exception:

- \textbf{ET\_ARITHMETIC} Division by 0.
- \textbf{ET\_ARITHMETIC} Division of $-2^{bpw-1}$ by $-1$
DIVU

Unsigned divide

Computes an unsigned integer division, rounding the answer down to 0. For example $5 \div 3$ is 1.

This instruction does not execute in a single cycle, and multiple threads may share the same division unit. The division may take up to $bpw$ thread-cycles.

The instruction has three operands:

- $op1$ $d$ Operand register, one of $r0...r11$
- $op2$ $x$ Operand register, one of $r0...r11$
- $op3$ $y$ Operand register, one of $r0...r11$

Mnemonic and operands:

\[ \text{DIVU} \quad d, x, y \]

Operation:

\[ d \leftarrow x \div y \]

Encoding:

\[ \begin{array}{cccccccccccc}
1 & 1 & 1 & 1 & . & . & . & . & . & . & . & . & . & . & \text{M&R}
\end{array} \]

Conditions that raise an exception:

- **ET_ARITHMETIC** Division by 0.
DRESTSP  

**Restore non debug stack pointer**

Causes thread 0 to use the original SP rather than the debug SP. Restores the SP from the debug saved stack pointer (DSSP)

**DENTSP** is used to use the save the original SP to the DSSP.

The instruction has no operands.

Mnemonic and operands:

```
DRESTSP
```

Operation:

```
sp ← dssp
```

Encoding:

```
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |
```

M&R

```
| 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |
```

Conditions that raise an exception:

**ET_ILLEGAL_INSTRUCTION** not in debug mode.
DRET

Return from debug interrupt

Exits debug mode, restoring thread 0’s program counter and stack pointer from the start of the debug interrupt.

DCALL calls a debug interrupt. DENTSP and DRESTSP instructions are used to switch to and from the debug SP.

The instruction has no operands.

Mnemonic and operands:

DRET

Operation:

\[ \begin{align*}
\text{pc}_{t0} & \rightarrow \text{dsc}\text{p} \\
\text{sr}_{t0} & \rightarrow \text{dssr}
\end{align*} \]

Encoding:

<table>
<thead>
<tr>
<th></th>
<th>1</th>
<th>1</th>
<th>1</th>
<th>1</th>
<th>1</th>
<th>1</th>
<th>1</th>
<th>0</th>
<th>1</th>
<th>0</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>M&amp;R</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>

Conditions that raise an exception:

- **ET_ILLEGAL_INSTRUCTION** not in debug mode.
- **ET_ILLEGAL_PC** The return address is invalid.
DUALENTSP  

Adjust stack and save link register

Stores the link register on the stack then adjusts the stack pointer creating enough space for the procedure call that has just been entered.

See RETSP for the operation that restores the link-register.

The instruction has one operand:

\[ op1 \ u_{16} \]

A 16-bit immediate in the range 0...65535. If \( u_{16} < 64 \), the instruction requires no prefix.

Mnemonic and operands:

\[
\text{DUALENTSP} \quad u_{16}
\]

Operation:

\[
\begin{align*}
& \text{if } u_{16} > 0 \text{ then} \\
& \quad \text{mem}[sp] \leftarrow lr \\
& \quad sp \leftarrow sp - u_{16} \times Bpw \\
& \quad sr[\text{bit } di] \leftarrow \text{true}
\end{align*}
\]

Encoding:

\[
\begin{array}{c}
u6 \quad 0 \ 1 \ 1 \ 1 \ 1 \ 1 \ 1 \ 0 \\
M \end{array}
\]

or prefixed for long immediates:

\[
\begin{array}{c}
lu6 \quad 1 \ 1 \ 1 \ 1 \ 0 \ | \ 1 \ 1 \ 1 \ 0 \\
M&R \end{array}
\]

Conditions that raise an exception:

\[
\text{ET_LOAD_STORE} \quad \text{The indexed address is unaligned, or does not point to a valid memory address.}
\]
ECALLF

Throw exception if zero

This instruction checks whether the operand is 0 (false) and raises an exception if it is the case. It can be used to implement assertions, and to implement array bound checks together with the LSU instruction.

The instruction has one operand:

\[ op \quad c \]Operand register, one of \( r0 \ldots r11 \)

Mnemonic and operands:

\[ \text{ECALLF} \quad c \]

Operation:

\[ \text{nop} \]

Encoding:

\[ \begin{array}{c|c|c|c|c|c|c|c|c|c|c|c|c|c|c} \hline 1r \quad & 0 & 1 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 0 & \ldots \quad \text{M+R} \\
\hline \end{array} \]

Conditions that raise an exception:

\[ \text{ET_ECALL} \quad c = 0. \]
ECALLT  

Throw exception if non-zero

This instruction checks whether a condition is not 0, and raises an exception if it is the case. It can be used to implement assertions.

The instruction has one operand:

\[ op1 \quad c \]

Operand register, one of \text{r0...r11}

Mnemonic and operands:

\[ \text{ECALLT} \quad c \]

Operation:

\[ \text{nop} \]

Encoding:

\[ \begin{array}{cccccccc}
1r & 0 & 1 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & \ldots \\
& M & + & R
\end{array} \]

Conditions that raise an exception:

\[ \text{ET_ECALL} \quad c \neq 0. \]
EDU

Unconditionally disable event

Clears the event enabled status of a resource, disabling events and interrupts from that resource.

The instruction has one operand:

\[ op1 \quad r \] Operand register, one of \( r0...r11 \)

Mnemonic and operands:

\[ \text{EDU} \quad r \]

Operation:

\[ enb_r \leftarrow 0 \]
\[ \text{thread}_r \leftarrow \text{tid} \]

Encoding:

\[ 1r \quad 0 \quad 0 \quad 0 \quad 0 \quad 1 \quad 1 \quad 1 \quad 1 \quad 0 \quad \ldots \quad R \]

Conditions that raise an exception:

- \text{ET\_RESOURCE\_DEP} Resource illegally shared between threads
- \text{ET\_ILLEGAL\_RESOURCE} \( r \) is not referring to a legal resource, or the resource is not in use.
EEF

Enables events conditionally

Sets or clears the enabled event status of a resource. If the condition is 0 (false), events and interrupts are enabled, if the condition is not 0, events and interrupts are disabled.

The instruction has two operands:

\[
\begin{align*}
\text{op1} & \quad d \quad \text{Operand register, one of r0...r11} \\
\text{op2} & \quad r \quad \text{Operand register, one of r0...r11}
\end{align*}
\]

Mnemonic and operands:

\[
\text{EEF} \quad d, r
\]

Operation:

\[
\begin{align*}
\text{en}_r & \quad \leftarrow \quad d = 0 \\
\text{thread}_r & \quad \leftarrow \quad \text{tid}
\end{align*}
\]

Encoding:

\[
\begin{array}{cccccccccc}
2r & 0 & 0 & 0 & 0 & 1 & 1 & 1 & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & R
\end{array}
\]

Conditions that raise an exception:

- \textit{ET\_RESOURCE\_DEP} \quad Resource illegally shared between threads
- \textit{ET\_ILLEGAL\_RESOURCE} \quad \(r\) is not referring to a legal resource, or the resource is not in use.
Enable events conditionally

Sets or clears the enabled event status of a resource. If the condition is 0 (false), events and interrupts are disabled, if the condition is not 0, events and interrupts are enabled.

The instruction has two operands:

\[ op1 \quad d \quad \text{Operand register, one of } r0...r11 \]
\[ op2 \quad r \quad \text{Operand register, one of } r0...r11 \]

Mnemonic and operands:

\[
\text{EET} \quad d, r
\]

Operation:

\[
enb_r \quad \leftarrow \quad d \neq 0
\]
\[
\text{thread}_r \quad \leftarrow \quad \text{tid}
\]

Encoding:

\[
2r \quad 0 0 1 0 0 \ldots \ldots \ldots 1 \ldots \ldots R
\]

Conditions that raise an exception:

- **ET RESOURCE DEP** Resource illegally shared between threads
- **ET ILLEGAL RESOURCE** \( r \) is not referring to a legal resource, or the resource is not in use.
EEU

Unconditionally enable event

Sets the event enabled status of a resource, enabling events and interrupts from that resource.

The instruction has one operand:

\[ op1 \quad r \]  
Operand register, one of r0... r11

Mnemonic and operands:

\[ \text{EEU} \quad r \]

Operation:

\[ enb_r \leftarrow 1 \]
\[ thread_r \leftarrow tid \]

Encoding:

\[ 1r \quad 0 0 0 0 | 1 1 1 1 | 1 1 | \ldots \quad \mathbf{R} \]

Conditions that raise an exception:

- \text{ET\_RESOURCE\_DEP}  
  Resource illegally shared between threads
- \text{ET\_ILLEGAL\_RESOURCE}  
  \text{op2} is not referring to a legal resource, or the resource is not in use.
ELATE

Throw exception if too late

This instruction checks whether the operand is in the past, and raises an exception if it is the case. It can be used to implement timing assertions.

The instruction has one operand:

\[ op \]

Operand register, one of r0... r11

Mnemonic and operands:

\[
\text{ELATE} \quad s
\]

Operation:

\[
op
\]

Encoding:

\[
\begin{bmatrix}
1 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & \cdots & M+R
\end{bmatrix}
\]

Conditions that raise an exception:

\[
\text{ET_ECALL} \quad s \text{ is in the past.}
\]
**ENDIN**

End a current input

Allows any remaining input bits to be read of a port, and produces an integer stating how much data is left. The produced integer is the number of bits of data remaining; ie, This assumes that the port is buffering and shifting data.

The port-shift-count is set to the number of bits present, so an ENDIN instruction can be followed directly by an IN instruction without having to perform a SETPSC.

The instruction has two operands:

- \( op_1 \)  \( d \)  Operand register, one of \( r0...r11 \)
- \( op_2 \)  \( r \)  Operand register, one of \( r0...r11 \)

Mnemonic and operands:

\[
\text{ENDIN} \quad d, r
\]

Operation:

\[
d \leftarrow \text{buffercount}_r
\]

Encoding:

\[
2r \quad 1\ 0\ 0\ 1\ 0\ |\ \ldots\ |\ 1\ \ldots\ \ \ R
\]

Conditions that raise an exception:

- **ET_RESOURCE_DEP**  Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE**  \( r \) is not referring to a legal resource, or the resource is not in use.
- **ET_ILLEGAL_RESOURCE**  \( r \) is referring to a port which is not in BUFFERS mode.
- **ET_ILLEGAL_RESOURCE**  \( r \) is referring to a port which is not in INPUT mode.
ENTSP

Adjust stack and save link register

Stores the link register on the stack then adjusts the stack pointer creating enough space for the procedure call that has just been entered.

See RETSP for the operation that restores the link-register.

The instruction has one operand:

\[ op1 \ u_{16} \]

A 16-bit immediate in the range 0...65535. If \( u_{16} < 64 \), the instruction requires no prefix.

Mnemonic and operands:

\[
\text{ENTSP} \quad u_{16}
\]

Operation:

\[
\begin{align*}
\text{if } u_{16} > 0 \\
\text{mem}[sp] &\leftarrow lr \\
sp &\leftarrow sp - u_{16} \times B\text{pw} \\
sr[\text{bit } di] &\leftarrow false
\end{align*}
\]

Encoding:

\[
\begin{array}{c|c}
\text{u6} & 01110|1101|\cdot\cdot\cdot M \\
\text{lu6} & 11110|\cdot\cdot\cdot|1101|\cdot\cdot\cdot M&R
\end{array}
\]

Conditions that raise an exception:

**ET_LOAD_STORE** The indexed address is unaligned, or does not point to a valid memory address.
**EQ**

**Equal**

Performs a test on whether two words are equal. If the two operands are equal, 1 is produced in the destination register, otherwise 0 is produced.

The instruction has three operands:

- $op1$  \textit{c}  Operand register, one of r0... r11
- $op2$  \textit{x}  Operand register, one of r0... r11
- $op3$  \textit{y}  Operand register, one of r0... r11

Mnemonic and operands:

```
EQ     c,x,y
```

Operation:

\[
\begin{align*}
c & \left\{ \begin{array}{ll}
           x = y, & 1 \\
           x \neq y, & 0
\end{array} \right.
\end{align*}
\]

Encoding:

\[
\begin{array}{c}
3r \\
0 0 1 1 \\
\cdot \cdot \cdot \cdot \cdot \cdot \\
M+R
\end{array}
\]
EQI  

Equal immediate

Performs a test on whether two words are equal. If the two operands are equal, 1 is produced in the destination register, otherwise 0 is produced.

The instruction has three operands:

\[
\begin{align*}
op1 & \quad c & \text{Operand register, one of } r0...r11 \\
op2 & \quad x & \text{Operand register, one of } r0...r11 \\
op3 & \quad u_s & \text{An integer in the range } 0...11
\end{align*}
\]

Mnemonic and operands:

\[
\text{EQI} \quad c, x, u_s
\]

Operation:

\[
c \leftarrow \begin{cases} 
x = u_s, & 1 \\
x \neq u_s, & 0 
\end{cases}
\]

Encoding:

\[
2russ \quad 1010 \ldots \ldots 
\]

M+R
EXTDP

Extend data

Extends the data area by moving the data pointer to a lower address.

The instruction has one operand:

\[ op1 \quad u_{16} \]

A 16-bit immediate in the range 0...65535. If \( u_{16} < 64 \), the instruction requires no prefix.

Mnemonic and operands:

\[
\text{EXTDP} \quad u_{16}
\]

Operation:

\[
dp \rightarrow dp - u_{16} \times Bpw
\]

Encoding:

<table>
<thead>
<tr>
<th>u6</th>
<th>0</th>
<th>1</th>
<th>1</th>
<th>1</th>
<th>0</th>
<th>0</th>
<th>1</th>
<th>1</th>
<th>0</th>
<th>. . . . .</th>
</tr>
</thead>
<tbody>
<tr>
<td>M+R</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

or prefixed for long immediates:

<table>
<thead>
<tr>
<th>lu6</th>
<th>1</th>
<th>1</th>
<th>1</th>
<th>1</th>
<th>0</th>
<th>. . . . . . . .</th>
</tr>
</thead>
<tbody>
<tr>
<td>M&amp;R</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>lu6</th>
<th>0</th>
<th>1</th>
<th>1</th>
<th>1</th>
<th>0</th>
<th>0</th>
<th>1</th>
<th>1</th>
<th>0</th>
<th>. . . . .</th>
</tr>
</thead>
<tbody>
<tr>
<td>M&amp;R</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
EXTSP

Extend stack

Extends the stack by moving the stack pointer to a lower address.

The instruction has one operand:

\[ op \quad u_{16} \quad \text{A 16-bit immediate in the range 0...65535. If } u_{16} < 64, \text{ the instruction requires no prefix} \]

Mnemonic and operands:

\[
\text{EXTSP} \quad u_{16}
\]

Operation:

\[
sp \leftarrow sp - u_{16} \times Bpw
\]

Encoding:

\[
\begin{align*}
\text{u6} & \quad 0111011110 \cdots \cdots \cdots \quad \text{M+R} \\
\text{lu6} & \quad 0111011110 \cdots \cdots \cdots \quad \text{M&R}
\end{align*}
\]

or prefixed for long immediates:
FREER  

Free a resource

Frees a resource so that it can be reused. Only resources that have been previously allocated with GETR can be freed; in particular, ports and clock-blocks cannot be freed since they are not allocated.

FREER pauses when freeing a channel end that has outstanding transmit data.

The instruction has one operand:

\[ \text{op} \quad r \quad \text{Operand register, one of } r0...r11 \]

Mnemonic and operands:

\[ \text{FREER} \quad r \]

Operation:

\[ \text{inuse}_r \leftarrow 0 \]

Encoding:

\[ 1r \quad 0001011111110 \ldots \quad R \]

Conditions that raise an exception:

- **ETRESOURCE DEP**  
  Resource illegally shared between threads

- **ET_ILLEGAL_RESOURCE**  
  \( r \) is not referring to a legal resource

- **ET_ILLEGAL_RESOURCE**  
  \( r \) is referring to a resource that cannot be freed

- **ET_ILLEGAL_RESOURCE**  
  \( r \) is referring to a running thread

- **ET_ILLEGAL_RESOURCE**  
  \( r \) is referring to a channel end on which no terminating CT_END token has been input and/or output, or which has data pending for input, or which has a thread waiting for input or output.
FREET  

Free unsynchronised thread

Stops the thread that executes this instruction, and frees it. This must not be used by synchronised threads, which should terminate by using a combination of an SSYNC on the slave and an MJOIN on the master.

The instruction has no operands.

Mnemonic and operands:

FREET

Operation:

\[ sr[inuse] \leftarrow 0 \]

Encoding:

<table>
<thead>
<tr>
<th></th>
<th>0</th>
<th>0</th>
<th>0</th>
<th>0</th>
<th>0</th>
<th>0</th>
<th>1</th>
<th>1</th>
<th>1</th>
<th>1</th>
<th>1</th>
<th>1</th>
<th>1</th>
<th>1</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>R</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

REV 1.0
GETD

Get resource data

Gets the contents of the data/dest/divide register of a resource. This data register is set using SETD. The way that a resource depends on its data register is resource dependent and described at SETD.

The instruction has two operands:

\[
\begin{align*}
op_1 & \quad d & \text{Operand register, one of } r0 \ldots r11 \\
op_2 & \quad r & \text{Operand register, one of } r0 \ldots r11 \\
\end{align*}
\]

Mnemonic and operands:

\[
\text{GETD } d, r
\]

Operation:

\[
d \leftarrow data_r
\]

Encoding:

\[
\begin{array}{c|c}
\text{I2r} & \text{M&R} \\
0 0 0 1 & 1 1 1 1 1 1 0 1 1 0 \end{array}
\]

Conditions that raise an exception:

- ET_RESOURCE_DEP Resource illegally shared between threads
- ET_ILLEGAL_RESOURCE \(d\) is not referring to a legal resource, or a resource which doesn’t have a DATA register.
GETED

Obtains the value of $ed$, exception data, into $r11$. In the case of an event, $edis$ set to the environment vector stored in the resource by SETEV. The data that is stored in $edin$ in the case of an exception is given in Chapter 21.

The instruction has no operands.

Mnemonic and operands:

GETED

Operation:

$$r11 \leftarrow ed$$

Encoding:

| 0 0 0 0 1 | 1 1 1 1 | 1 | 1 | 1 0 | M+R |
GETET

Obtains the value of ET (exception type) into r11.
The instruction has no operands.
Mnemonic and operands:

GETET

Operation:

\[ r11 \leftarrow et \]

Encoding:

\[
\begin{array}{ccccccccccc}
0 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & M+R
\end{array}
\]
GETID

Get the thread's ID

Get the thread ID of this thread into \( r11 \).

The instruction has no operands.

Mnemonic and operands:

\[
\text{GETID}
\]

Operation:

\[
r11 \leftarrow \text{tid}
\]

Encoding:

\[
0 \ 0 \ 0 \ 1 \ 0 \ 1 \ 1 \ 1 \ 1 \ 1 \ 0 \ 1 \ 1 \ 0 \quad \text{M+R}
\]
GETKEP

Get the kernel entry point of this thread into $r11$.

The instruction has no operands.

Mnemonic and operands:

\[
\text{GETKEP}
\]

Operation:

\[
r11 \leftarrow kep
\]

Encoding:

\[
0 \ 0 \ 0 \ 1 \ 0 \ 1 \ 1 \ 1 \ 1 \ 1 \ 1
\]

\[\text{M+R}\]
GETKSP

Gets the thread’s Kernel Stack Pointer \( ksp \) into \( r11 \). There is no instruction to set \( ksp \) directly since it is normally not moved. SETSP followed by KRESTSP will set both \( sp \) and \( ksp \). By saving \( sp \) beforehand, \( ksp \) can be set to the value found in \( r0 \) by using the following code sequence:

\[
\begin{align*}
\text{LDAWSP} & \quad r1, \ sp[0] \quad // \text{Save SP into R1} \\
\text{SETSP} & \quad r0 \quad // \text{Set SP, and place old SP...} \\
\text{STW} & \quad r1, \ sp[0] \quad // \quad \ldots \text{where KRESTSP expects it} \\
\text{KRESTSP} & \quad 0 \quad // \text{Set KSP, restore SP}
\end{align*}
\]

The kernel stack pointer is initialised by the boot-ROM to point to a safe location near the last location of RAM - the last few locations are used by the JTAG debugging interface. If debugging is not required, then the KSP can safely be moved to the top of RAM.

The instruction has no operands.

Mnemonic and operands:

GETKSP

Operation:

\[
r11 \leftarrow ksp
\]

Encoding:

\[
0 0 0 1 0 1 1 1 1 1 1 1 1 0 0 \quad \text{M+R}
\]
GETN

Gets the network identifier that this channel-end belongs to.

The network identifier is set using SETN.

The instruction has two operands:

\[ op_1 \quad d \quad \text{Operand register, one of } r0...r11 \]
\[ op_2 \quad r \quad \text{Operand register, one of } r0...r11 \]

Mnemonic and operands:

\[
\text{GETN} \quad d, r
\]

Operation:

\[
d \leftarrow \text{net}_r
\]

Encoding:

\[
\begin{array}{cccccccccc}
1 & 1 & 1 & 1 & 1 & 0 & 1 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 0 & 0 & 1 & 0 & 0 & 0 &
\end{array}
\]

M&R

Conditions that raise an exception:

- **ET_RESOURCE_DEP** Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE** \( d \) is not referring to a legal channel end, or the channel end is not in use.
GETPS

Obtains internal processor state; used for low level debugging. The operand is a processor state resource; the register to be read is encoded in bits 15...8, and bits 7...0 should contain the resource type associated with processor state.

The instruction has two operands:

\[
\begin{align*}
\text{op1} & \quad d \quad \text{Operand register, one of } r0... r11 \\
\text{op2} & \quad r \quad \text{Operand register, one of } r0... r11
\end{align*}
\]

Mnemonic and operands:

\[
\text{GETPS} \quad d, r
\]

Operation:

\[
d \leftarrow PS[r]
\]

Encoding:

\[
\begin{array}{cccccccc}
1 & 1 & 1 & 1 & \cdots & \cdots & 1 & \cdot \\
0 & 0 & 0 & 1 & 0 & 1 & 1 & 1 & 1 & 1 & 0 & 1 & 1 & 0 \\
\end{array}
\]

M&R

Conditions that raise an exception:

\[
\text{ET_ILLEGAL_PS} \quad d \text{ is not referring to a legal processor state register}
\]
**GETR**

Gets a resource of a specific type. This instruction dynamically allocates a resource from the pools of available resources. Not all resources are dynamically allocated; resources that refer to physical objects (IO pins, clock blocks) are used without allocating. The resource types are:

- **RES_TYPE_PORT** Ports 0 cannot be allocated
- **RES_TYPE_TIMER** Timers 1
- **RES_TYPE_CHANEND** Channel ends 2
- **RES_TYPE_SYNC** Synchronisers 3
- **RES_TYPE_THREAD** Threads 4
- **RES_TYPE_LOCK** Lock 5
- **RES_TYPE_CLKBLK** Clock source 6 cannot be allocated
- **RES_TYPE_PS** Processor state 11 cannot be allocated
- **RES_TYPE_CONFIG** Configuration messages 12 cannot be allocated

The returned identifier comprises a 32-bit word, where the most significant 16-bits are resource specific data, followed by an 8-bit resource counter, and 8-bits resource-type. The resource specific 16 bits have the following meaning:

- **Port**
  The width of the port.

- **Timer**
  Reserved, returned as 0.

- **Channel end**
  The node id (8-bits) and the core id (8-bits).

- **Synchroniser**
  Reserved, returned as 0.

- **Thread**
  Reserved, returned as 0.

- **Lock**
  Reserved, returned as 0.

- **Clock source**
  Reserved, should be set to 0.

- **Processor state**
  Reserved, should be set to 0.

- **Configuration**
  Reserved, should be set to 0.

If no resource of the requested type is available, then the destination operand is set to zero, otherwise the destination operand is set to a valid resource id.
If a channel end is allocated, a local channel end is returned. In order to connect to a remote channel end, a program normally receives a channel-end over an already connected channel, which is stored using SETD. To connect the first remote channel, a channel-end identifier can be constructed (by concatenating a node id, core id, channel-end and the value '2').

When allocated, resources are freed using FREER to allow them to be available for reallocation.

The instruction has two operands:

\[
\begin{align*}
op1 & \quad d & \text{Operand register, one of } r0...r11 \\
op2 & \quad u_s & \text{An integer in the range } 0...11
\end{align*}
\]

Mnemonic and operands:

\[
\text{GETR} \quad d, u_s
\]

Operation:

\[
d \leftarrow \text{first } res \in setof(u_s) : \neg inuse_{res}
\]

\[
inuse_d \leftarrow 1
\]

Encoding:

\[
\text{rus} \quad 1 \; 0 \; 0 \; 0 \; 0 \; \ldots \; \ldots \; 0 \; \ldots \; R
\]
GETSR

Get bits from the thread’s Status Register. The mask supplied specifies which bits should be extracted.

SETSR is used to set bits in the status register. The value of these bits are documented on the SETSR page.

The instruction has one operand:

\[ op1 \quad u_{16} \]

A 16-bit immediate in the range 0...65535. If \( u_{16} < 64 \), the instruction requires no prefix

Mnemonic and operands:

\[
\text{GETSR} \quad u_{16}
\]

Operation:

\[
r11 \leftarrow sr \land bit \, u_{16}
\]

Encoding:

\[
\begin{array}{cccccc}
0 & 1 & 1 & 1 & 1 & \cdot \cdot \cdot \cdot \cdot \cdot \\
M+R
\end{array}
\]

or prefixed for long immediates:

\[
\begin{array}{cccccc}
1 & 1 & 1 & 1 & 0 & \cdot \cdot \cdot \cdot \cdot \cdot \\
M&R
\end{array}
\]
**GETST**

Get a synchronised thread

Gets a new thread and binds it to a synchroniser. The synchroniser ID is passed as an operand to this instruction, and the destination register is set to the resulting thread ID. If no threads are available then the destination register is set to 0.

The thread is started on execution of **MSYNC** by the master thread.

The instruction has two operands:

\[
\begin{align*}
\text{op1} & \quad d & \text{Operand register, one of } \{r0 \ldots r11\} \\
\text{op2} & \quad r & \text{Operand register, one of } \{r0 \ldots r11\}
\end{align*}
\]

Mnemonic and operands:

\[
\text{GETST} \quad d, r
\]

Operation:

\[
\begin{align*}
\text{d} & \quad \text{first thread } \in \text{threads} : \neg \text{inuse}_{\text{thread}} \\
\text{inuse}_d & \quad = 1 \\
\text{spausd} & \quad = \text{spausd} \cup \{d\} \\
\text{slaves}_r & \quad = \text{slaves}_r \cup \{d\} \\
\text{mstr}_r & \quad = \text{tid}
\end{align*}
\]

Encoding:

\[
\begin{array}{cccccccc}
2r & 0 & 0 & 0 & 0 & 0 & \ldots & \ldots & \text{R}
\end{array}
\]

Conditions that raise an exception:

- **ET_RESOURCE_DEP** Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE** \( r \) is not referring to a synchroniser that is in use
GETTIME

Get the reference time

Gets the current value of the reference time and loads it into the specified register

The instruction has one operand:

\[ op \quad d \quad \text{Operand register, one of } r0...r11 \]

Mnemonic and operands:

\[ \text{GETTIME} \quad d \]

Operation:

\[ d \leftarrow \text{reference-time} \]

Encoding:

\[ 1r \quad \begin{array}{cccccccccccc} 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 0 \end{array} \ldots \quad \text{M+R} \]
GETTS

Get the time stamp

Gets the time stamp of a port. This is the value of the port timer at which the previous transfer between the Shift and Transfer registers for input or output occurred. The port timer counts ticks of the clock associated with this port, and returns a 16-bit value. In the case of a conditional input, this instruction should be executed between a \texttt{WAITEU} and its associated \texttt{IN} instruction; the value returned by \texttt{GETTS} will be the timestamp of the data that will be input using the \texttt{IN} instruction.

The instruction has two operands:

\begin{equation}
\begin{align*}
op_1 & \quad d \quad \text{Operand register, one of } r0...r11 \\
op_2 & \quad r \quad \text{Operand register, one of } r0...r11
\end{align*}
\end{equation}

Mnemonic and operands:

\texttt{GETTS } \quad d, r

Operation:

\begin{equation}
d \leftarrow \text{timestamp}_r
\end{equation}

Encoding:

\begin{equation}
2r \quad \begin{array}{c} 
0 \ 0 \ 1 \ 1 \ 1 \quad \cdots \cdots \quad 0 \quad \cdots \cdots
\end{array} \quad R
\end{equation}

Conditions that raise an exception:

\begin{itemize}
\item \texttt{ET\_RESOURCE\_DEP} \quad \text{Resource illegally shared between threads}
\item \texttt{ET\_ILLEGAL\_RESOURCE} \quad r \text{ is not referring to a port, or the port is not in use.}
\end{itemize}
IN

Input data

Inputs data from a resource \((r)\) into a destination register \((d)\). The precise effect depends on the resource type:

**Port**
Read data from the port. If the port is buffered, a whole word of data is returned. If the port is unbuffered, the most significant bits of the data will be set to 0. The thread pauses if the data is not available.

**Timer**
Reads the current time from the timer, or pauses until after a specific time returning that time.

**Channel end**
Reads \(Bpw\) data tokens from the channel, and concatenate them to a single word of data. The bytes are assumed to be transmitted most significant byte first. The thread pauses if there are not enough data tokens available.

**Lock**
Lock the resource. The instruction pauses if the lock has been taken by another thread, and is released when the out is released.

This instruction may pause.

The instruction has two operands:

\[
\begin{align*}
\text{op1} & \quad d & \text{Operand register, one of } r0 \ldots r11 \\
\text{op2} & \quad r & \text{Operand register, one of } r0 \ldots r11
\end{align*}
\]

Mnemonic and operands:

\[
\text{IN} \quad d, r
\]

Operation:

\[
r \rightarrow d
\]

Encoding:

\[
2r \quad 1010\ldots0\ldots0\ldots0 \quad \text{R}
\]

Conditions that raise an exception:

- **ET_RESOURCE_DEP** Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE** \(r\) is not a valid resource, not in use, or it does not support \text{IN}.
- **ET_ILLEGAL_RESOURCE** \(r\) is a channel end which contains a Control Token in the first \(Bpw\) tokens in its input buffer.
INCT

Input control tokens

If the next token on a channel is a control token, then this token is input to the destination register. If not, the instruction raises an exception.

This instruction pauses if the channel does not have a token of data available to input.

This instruction can be used together with OUTCT in order to implement robust protocols on channels.

The instruction has two operands:

\[
\begin{align*}
\text{op1} & \quad d & \text{Operand register, one of r0... r11} \\
\text{op2} & \quad r & \text{Operand register, one of r0... r11}
\end{align*}
\]

Mnemonic and operands:

INCT \quad d, r

Operation:

\[
\text{if hasctoken}(r) \text{ then} \\
\quad r \triangleright d \\
\text{else} \\
\quad \text{raiseexception}
\]

Encoding:

\[
2r \quad 1\ 0\ 0\ 0\ 0\ \ldots\ \ldots\ \ldots\ 1\ \ldots\ \ldots\ \ R
\]

Conditions that raise an exception:

- **ETRESOURCEDEP** Resource illegally shared between threads
- **ETILLEGALRESOURCE** \( r \) is not pointing to a channel resource, or the resource is not in use.
- **ETILLEGALRESOURCE** \( r \) is a channel end which contains a data token in the first entry in its input buffer.
INPW

Input a part word

Inputs an incomplete word that is stored in the input buffer of a port. Used in conjunction with ENDIN. ENDIN is used to determine how many bits are left on the port, and this number is passed to INPW in order to read those remaining bits.

The instruction has three operands:

- \( op_1 \ d \): Operand register, one of \( r_0 \ldots r_{11} \)
- \( op_2 \ r \): Operand register, one of \( r_0 \ldots r_{11} \)
- \( op_3 \ bitp \): A bit position; one of \( bpw, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32 \)

Mnemonic and operands:

\[ \text{INPW} \quad d, r, bitp \]

Operation:

\[ \text{shiftcount}_r \leftarrow bitp \]
\[ r \uparrow d \]

Encoding:

<table>
<thead>
<tr>
<th>( \text{l2rus} )</th>
<th>1 1 1 1 1</th>
<th>\ldots</th>
<th>\ldots</th>
<th>\ldots</th>
</tr>
</thead>
<tbody>
<tr>
<td>( \text{M&amp;R} )</td>
<td>1 1 1 1 1</td>
<td>1 1 1 1 1</td>
<td>0 1 1 1 0</td>
<td></td>
</tr>
</tbody>
</table>

Conditions that raise an exception:

- ET_RESOURCE_DEP: Resource illegally shared between threads
- ET_ILLEGAL_RESOURCE: \( r \) is not pointing to a port resource, or the resource is not in use, or \( bitp \) is an unsupported width, or the port is not in BUFFERS mode.
INSHR

Input and shift right

Inputs a value from a port, and shifts the data read into the most significant bits of the destination register. The bottom port-width bits of the destination register are lost.

The instruction has two operands:

\[
\begin{align*}
\text{op}_1 & \quad d \quad \text{Operand register, one of } r_0 \ldots r_{11} \\
\text{op}_2 & \quad r \quad \text{Operand register, one of } r_0 \ldots r_{11}
\end{align*}
\]

Mnemonic and operands:

\[
\text{INSHR} \quad d, r
\]

Operation:

\[
\begin{align*}
\text{op}_1 & \quad d \quad \text{Operand register, one of } r_0 \ldots r_{11} \\
\text{op}_2 & \quad r \quad \text{Operand register, one of } r_0 \ldots r_{11}
\end{align*}
\]

\[
\begin{align*}
\quad & \quad r \xrightarrow{\text{INSHR}} x \\
\quad & \quad d \leftarrow x : d[bpw - 1 \ldots portwidth_r]
\end{align*}
\]

Encoding:

\[
2r \quad \begin{array}{c}
0110 \ldots 1 \ldots \ldots \ldots 1 \ldots \\
\end{array}
\]

Conditions that raise an exception:

- **ET_RESOURCE_DEP** Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE** \( r \) is not pointing to a port resource, or the resource is not in use.
INT

Input a token of data

If the next token on a channel is a data token, then this token is input into the destination register. If not, the instruction raises an exception.

This instruction pauses if the channel does not have a token of data available to input.

The instruction has two operands:

\[ \text{op1} \; d \quad \text{Operand register, one of } r0...r11 \]
\[ \text{op2} \; r \quad \text{Operand register, one of } r0...r11 \]

Mnemonic and operands:

\[ \text{INT} \quad d, r \]

Operation:

\[ \text{if hasctoken(r) then} \]
\[ \text{raiseexception} \]
\[ \text{else} \]
\[ r \triangleright d \]

Encoding:

\[ 2r \quad \begin{array}{c} 1 \; 0 \; 0 \; 0 \; 1 \end{array} \quad R \]

Conditions that raise an exception:

- **ET_RESOURCE_DEP**: Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE**: \( r \) is not pointing to a channel resource, or the resource is not in use.
- **ET_ILLEGAL_RESOURCE**: \( r \) contains a control token in the first entry in its input buffer.
KCALL

Performs a kernel call. The program counter, status register and exception data are stored in save-registers \( spc, ssr, \) and \( sed \) and the program continues at the kernel entry point. Similar to exceptions, the program counter that is saved on KCALL is the program counter of this instruction - hence an kernel call handler using KRET has to adjust \( spc \) prior to returning.

The instruction has one operand:

\[
op\_1 \quad s \quad \text{Operand register, one of r0... r11}
\]

Mnemonic and operands:

\[
\text{KCALL} \quad s
\]

Operation:

\[
\begin{align*}
spc & \leftarrow pc \\
ssr & \leftarrow sr \\
et & \leftarrow ET\_KCALL \\
esd & \leftarrow ed \\
ed & \leftarrow s \\
pc & \leftarrow kep + 64 \\
sr[ink] & \leftarrow 1 \\
sr[ieble] & \leftarrow 0 \\
sr[eeble] & \leftarrow 0
\end{align*}
\]

Encoding:

\[
1r \quad 0\ 1\ 0\ 0\ 0|1\ 1\ 1\ 1|1\ 0|\ \cdots\ \ M
\]

Conditions that raise an exception:

\[
\text{ET\_KCALL} \quad \text{Kernel call.}
\]
**KCALLI**

Performs a kernel call. The program counter, status register and exception data are stored in save-registers \textit{spc}, \textit{ssr}, and \textit{sed} and the program continues at the kernel entry point. Similar to exceptions, the program counter that is saved on KCALL is the program counter of this instruction - hence any kernel call handler using KRET has to adjust \textit{spc} prior to returning.

The instruction has one operand:

\[ op1 \quad u_{16} \]

A 16-bit immediate in the range 0...65535. If \( u_{16} < 64 \), the instruction requires no prefix.

**Mnemonic and operands:**

\[
\begin{align*}
\text{KCALLI} & \quad u_{16} \\
\text{Operation:} & \\
spc & \leftarrow pc \\
ssr & \leftarrow sr \\
et & \leftarrow ET\_KCALL \\
sed & \leftarrow ed \\
ed & \leftarrow u_{16} \\
pc & \leftarrow kep + 64 \\
sr[\text{ink}] & \leftarrow 1 \\
sr[\text{ieble}] & \leftarrow 0 \\
sr[\text{eeble}] & \leftarrow 0
\end{align*}
\]

**Encoding:**

| \u6 | 0 1 1 1 0 | 0 | 1 1 1 1 | \ldots | M |

or prefixed for long immediates:

\[
\begin{align*}
\text{lu6} & \\
0 1 1 1 0 0 & | \ldots | \ldots | \ldots | M & \& R
\end{align*}
\]

**Conditions that raise an exception:**

\textbf{ET\_KCALL}  Kernel call.
**KENTSP** Switch to kernel stack

Saves the stack pointer on the kernel stack, then sets the stack pointer to the kernel stack.

*KRESTSP* is used to use the restore the original stack pointer from the kernel stack.

The instruction has one operand:

\[ op1 \ u_{16} \]

A 16-bit immediate in the range 0...65535. If \( u_{16} < 64 \), the instruction requires no prefix.

Mnemonic and operands:

\[
\begin{align*}
\text{KENTSP} & \quad u_{16} \\
\end{align*}
\]

Operation:

\[
\begin{align*}
\text{mem}[ksp] & \leftarrow sp \\
sp & \leftarrow ksp - n \times Bpw
\end{align*}
\]

Encoding:

\[
\begin{align*}
u6 & \quad 0 1 1 1 1 1 0 1 1 1 0 \cdots \cdots \cdots \quad M \\
\end{align*}
\]

or prefixed for long immediates:

\[
\begin{align*}
u_{6} & \quad 1 1 1 1 0 0 \cdots \cdots \cdots \cdots \\
\end{align*}
\]

\[
\begin{align*}
l_{6} & \quad 0 1 1 1 1 1 0 1 1 1 0 \cdots \cdots \cdots \quad M\&R \\
\end{align*}
\]

Conditions that raise an exception:

*ET_LOAD_STORE* Register \( ksp \) points to an unaligned address, or does not point to a valid memory location.
KRESTSP

restore stack pointer from kernel stack

Restores the stack pointer from the address saved on entry to the kernel by KENTSP. This instruction is also used to initialise the kernel-stack-pointer.

KENTSP is used to save the stack pointer on entry to the kernel.

The instruction has one operand:

\[ op1 \quad u_{16} \quad A \text{ 16-bit mask} \]

Mnemonic and operands:

\[
\text{KRESTSP} \quad u_{16}
\]

Operation:

\[
ksp \leftarrow sp + n \times Bpw \\
sp \leftarrow \text{mem}[ksp]
\]

Encoding:

\[
\begin{array}{cccccccccccc}
1 & 1 & 1 & 1 & 0 & 0 & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots \\
0 & 1 & 1 & 1 & 0 & 0 & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots \\
\end{array}
\]

Conditions that raise an exception:

ET_LOAD_STORE The indexed address points to an unaligned address, or the indexed address does not point to a valid memory location.
KRET

Kernel Return

Returns from the kernel after an interrupt, kernel call, or exception.
The instruction has no operands.
Mnemonic and operands:

```
KRET
```

Operation:

```
pc  ←  spc
sr  ←  ssr
ed  ←  sed
```

Encoding:

```
1 1 1 1 | 1 1 1 1 | 1 0 | 1 1 0 0
```

Conditions that raise an exception:

- **ET_ILLEGAL_PC** The register `spc` was not 16-bit aligned or did not point to a valid memory location.
**LADD**

**Long unsigned add with carry**

Adds two unsigned integers and a carry, and produces both the unsigned result and the possible carry. For this purpose, the instruction has five operands, two registers that contain the numbers to be added (\(x\) and \(y\)); the carry which is stored in the last bit of a third source operand (\(v\)); one destination register which is used to store the carry (\(e\)), and a destination register for the sum (\(d\)).

The instruction has five operands:

- \(op1\) \(d\) Operand register, one of \(r0\) ... \(r11\)
- \(op4\) \(e\) Operand register, one of \(r0\) ... \(r11\)
- \(op2\) \(x\) Operand register, one of \(r0\) ... \(r11\)
- \(op3\) \(y\) Operand register, one of \(r0\) ... \(r11\)
- \(op5\) \(v\) Operand register, one of \(r0\) ... \(r11\)

Mnemonic and operands:

```
LADD   \(d\), \(e\), \(x\), \(y\), \(v\)
```

Operation:

```
\(d \leftarrow r[bpw - 1...0]\)
\(e \leftarrow r[bpw]\)
\(\text{where } r \leftarrow x + y + v[0]\)
```

Encoding:

```plaintext
| 1111 | . . . . . . . |
| 0000 | . . . . . . . |
```

M&R
LD8U

Load unsigned 8 bits

Loads an unsigned 8-bit value from memory. The address is computed using a base address \((b)\) and index \((i)\).

The instruction has three operands:

\[
\begin{align*}
\text{op1} & \quad d \quad \text{Operand register, one of } r0...r11 \\
\text{op2} & \quad b \quad \text{Operand register, one of } r0...r11 \\
\text{op3} & \quad i \quad \text{Operand register, one of } r0...r11
\end{align*}
\]

Mnemonic and operands:

\[
\text{LD8U} \quad d, b, i
\]

Operation:

\[
d \gets 0 : ... : 0 : \text{word}[\text{bnum} + 7...\text{bnum}]
\]
where \(ea \gets b + i\)

\[
\begin{align*}
\text{bytenum} & \gets ea \mod Bpw \\
\text{bnum} & \gets 8 \times \text{bytenum} \\
\text{word} & \gets \text{mem}[ea – \text{bytenum}]
\end{align*}
\]

Encoding:

\[
\begin{array}{ccccccccc}
\text{3r} & 1 & 0 & 0 & 0 & 1 & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot
\end{array}
\]

Conditions that raise an exception:

**ET_LOAD_STORE** The indexed address does not point to a valid memory location.
**LD16S**

Load signed 16 bits

Loads a signed 16-bit integer from memory extending the sign into the whole word. The address is computed using a base address \( b \) and index \( i \). The base address should be word-aligned.

The instruction has three operands:

\[
\begin{align*}
op_1 & \quad d \quad \text{Operand register, one of } r0...r11 \\
op_2 & \quad b \quad \text{Operand register, one of } r0...r11 \\
op_3 & \quad i \quad \text{Operand register, one of } r0...r11
\end{align*}
\]

Mnemonic and operands:

\[
\text{LD16S} \quad d, b, i
\]

Operation:

\[
d \leftarrow \text{word}[bnum + 15] : \ldots : \text{word}[bnum + 15] : \text{word}[bnum + 15...bnum]
\]

where \( ea \leftarrow b + i \times 2 \)

\[
\begin{align*}
bytenum & \leftarrow ea \mod Bpw \\
bnum & \leftarrow 16 \times (bytenum \div 2) \\
word & \leftarrow \text{mem}[ea - bytenum]
\end{align*}
\]

Encoding:

\[
3r \quad \begin{array}{cccccccc}
0 & 0 & 0 & 0 & 0 & \ldots & \ldots & \ldots & \ldots & \ldots & M
\end{array}
\]

Conditions that raise an exception:

**ET_LOAD_STORE** \( b \) is not 16-bit aligned (unaligned load), or does not point to a valid memory location.
LDA16B

Load effective address for a 16-bit value based on a base-address \( b \) and an index \( i \)

The instruction has three operands:

- \( op1 \ d \): Operand register, one of \( r0...r11 \)
- \( op2 \ b \): Operand register, one of \( r0...r11 \)
- \( op3 \ i \): Operand register, one of \( r0...r11 \)

Mnemonic and operands:

\[
\text{LDA16B} \quad d, b, i
\]

Operation:

\[
d \rightarrow b - i \times 2
\]

Encoding:

\[
\begin{array}{c}
|1|1|1|1|\ldots|\ldots|\ldots|\ldots|\ldots|\ldots|1|1|1|1|1|0|1|1|0|0|
\end{array}
\]

M&R
LDA16F

Add to a 16-bit address

Load effective address for a 16-bit value based on a base-address ($b$) and an index ($i$)

The instruction has three operands:

- $op1 \ d$ Operand register, one of $r0...r11$
- $op2 \ b$ Operand register, one of $r0...r11$
- $op3 \ i$ Operand register, one of $r0...r11$

Mnemonic and operands:

LDA16F $d, b, i$

Operation:

$$d \leftarrow b + i \times 2$$

Encoding:

<table>
<thead>
<tr>
<th>$l3r$</th>
<th>0 0 1 0</th>
<th>1 1 1 1</th>
<th>1 0 1 1 0</th>
<th>M&amp;R</th>
</tr>
</thead>
<tbody>
<tr>
<td>$op1$</td>
<td>1 1 1 1</td>
<td>... ...</td>
<td>... ...</td>
<td></td>
</tr>
<tr>
<td>$op2$</td>
<td></td>
<td></td>
<td>1 1 1 1</td>
<td></td>
</tr>
<tr>
<td>$op3$</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
LDAPB

Load effective address relative to the program counter. This operation scales the index ($u_{20}$) so that it counts 16-bit entities.

The instruction has one operand:

$$op1 \quad u_{20} \quad \text{A 20-bit immediate in the range 0...1048575. If } u_{20} < 1024, \text{ the instruction requires no prefix}$$

Mnemonic and operands:

$$\text{LDAPB} \quad u_{20}$$

Operation:

$$r11 \leftarrow pc - u_{20} \times iw$$

Encoding:

$$u10 \quad 1\,1\,0\,1\,1\,1|1|\cdots\cdots\cdots\cdots\cdots\cdots\quad \text{M+R}$$

or prefixed for long immediates:

$$lu10 \quad 1\,1\,0\,1\,1\,1|1|\cdots\cdots\cdots\cdots\cdots\quad \text{M&R}$$
LDAPF  

Load effective address relative to the program counter. This operation scales the index \( u_{20} \) so that it counts 16-bit entities.

The instruction has one operand:

\[
\text{op1 } u_{20} \quad \text{A 20-bit immediate in the range 0...1048575. If } u_{20} < 1024, \text{ the instruction requires no prefix}
\]

Mnemonic and operands:

\[
\text{LDAPF} \quad u_{20}
\]

Operation:

\[
r_{11} \leftarrow pc + u_{20} \times iw
\]

Encoding:

\[
\text{u10} \quad 1 \ 1 \ 0 \ 1 \ 1 \ 0 \ 1 \ 1 \ 1 \ 0 \ 1 \ 1 \ 1 \ 0 \ \ldots \ldots \ldots \ldots \quad \text{M+R}
\]

or prefixed for long immediates:

\[
\text{lu10} \quad 1 \ 1 \ 1 \ 0 \ 1 \ 1 \ 1 \ 0 \ 1 \ 1 \ 1 \ 0 \ \ldots \ldots \ldots \ldots \quad \text{M&R}
\]
LDAWB

Subtract from word address

Load effective address for word given a base-address (b) and an index (i)

The instruction has three operands:

- op1  d  Operand register, one of r0... r11
- op2  b  Operand register, one of r0... r11
- op3  i  Operand register, one of r0... r11

Mnemonic and operands:

LDAWB  d, b, i

Operation:

\[ d \leftarrow b - i \times Bp w \]

Encoding:

<table>
<thead>
<tr>
<th>op1</th>
<th>op2</th>
<th>op3</th>
</tr>
</thead>
<tbody>
<tr>
<td>1111</td>
<td>00</td>
<td>00</td>
</tr>
</tbody>
</table>

I3r  0 1 0 0 1 1 1 1 1 0 1 0 0  M&R
LDAWBI  

Subtract from word address immediate

Load effective address for word given a base-address \(b\) and an index \(u_s\).

The instruction has three operands:

- \(op1\) \(d\)  Operand register, one of r0... r11
- \(op2\) \(b\)  Operand register, one of r0... r11
- \(op3\) \(u_s\)  An integer in the range 0...11

Mnemonic and operands:

\[
\text{LDAWBI} \quad d, b, u_s 
\]

Operation:

\[
d \leftarrow b - u_s \times Bp_w
\]

Encoding:

\[
\begin{array}{cccccccccccc}
\text{l2rus} & 1 & 1 & 1 & 1 & 0 & 0 & 1 & 1 & 1 & 1 & 1 \text{ M&R}
\end{array}
\]
LDAWCP Load address of word in constant pool

Loads the address of a word relative to the constant pointer.

The instruction has one operand:

\[ \text{op} \ u_{16} \]

A 16-bit immediate in the range 0...65535. If \( u_{16} < 64 \), the instruction requires no prefix

Mnemonic and operands:

\[ \text{LDAWCP} \quad u_{16} \]

Operation:

\[ r_{11} \leftarrow cp + u_{16} \times Bpw \]

Encoding:

\[
\begin{array}{cccccccccc}
0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 0 & 1 \\
\end{array}
\]

or prefixed for long immediates:

\[
\begin{array}{cccccccccc}
1 & 1 & 1 & 1 & 0 & 0 & \cdots & \cdots & \cdots & \cdots \\
\end{array}
\]

\[
\begin{array}{cccccccccc}
0 & 1 & 1 & 1 & 1 & 1 & 1 & 0 & 1 & \cdots & \cdots \\
\end{array}
\]

M+R
**LDAWDP**

**Load address of word in data pool**

Loads the address of a word relative to the data pointer.

The instruction has two operands:

- \( op1 \quad D \quad \text{Any of } r0...r11, \ cp, \ dp, \ sp, \ lr \)
- \( op2 \quad u_{16} \quad \text{A 16-bit immediate in the range } 0...65535. \text{ If } u_{16} < 64, \text{ the instruction requires no prefix} \)

Mnemonic and operands:

\[
\text{LDAWDP} \quad D, u_{16}
\]

Operation:

\[
D \leftarrow dp + u_{16} \times Bpw
\]

Encoding:

- \( \text{ru6} \quad 01100|0\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\·\cdot\cdot\cdot\cdot\·\cdot\cdot\·\cdot\cdot\·\cdot\cdot\·\·\cdot\cdot\·\·\cdot\cdot\·\cdot\cdot\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\·\ Education Technology
LDAWF  

Add to a word address

Load effective address for word given a base-address ($b$) and an index ($i$).

The instruction has three operands:

- $op_1$  $d$  Operand register, one of $r0$... $r11$
- $op_2$  $b$  Operand register, one of $r0$... $r11$
- $op_3$  $i$  Operand register, one of $r0$... $r11$

Mnemonic and operands:

```
LDAWF      d, b, i
```

Operation:

```
d  ←  b + i × Bp\ w
```

Encoding:

```
\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|}
\hline
3r & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 0 & 1 & 0 \\
\hline
\end{array}
```

M&R
**LDAWFI**  
*Add to a word address immediate*

Load effective address for word given a base-address \((b)\) and an index \((i)\).

The instruction has three operands:

- \(op_1\) \(d\)  Operand register, one of \(r0...r11\)
- \(op_2\) \(b\)  Operand register, one of \(r0...r11\)
- \(op_3\) \(i\)  An integer in the range \(0...11\)

Mnemonic and operands:

\[
\text{LDAWFI} \quad d, b, i
\]

Operation:

\[
d \leftarrow b + i \times Bpw
\]

Encoding:

<table>
<thead>
<tr>
<th>L2rus</th>
<th>M&amp;R</th>
</tr>
</thead>
<tbody>
<tr>
<td>1 0 0 1 1 1 1 1 1 1 0 1 1 0</td>
<td></td>
</tr>
</tbody>
</table>
**LDAWSP**

Load address of word on stack

Loads the address of a word relative to the stack pointer.

The instruction has two operands:

- $op_1\ D$ Any of r0... r11, cp, dp, sp, lr
- $op_2\ u_{16}$ A 16-bit immediate in the range 0...65535. If $u_{16} < 64$, the instruction requires no prefix

Mnemonic and operands:

$$\text{LDAWSP} \quad D, u_{16}$$

Operation:

$$D \leftarrow sp + u_{16} \times Bpw$$

Encoding:

- ru6 $\begin{array}{l} 0 1 1 0 \text{01}\ldots\ldots\ldots\ldots \text{M+R} \\
\end{array}$

or prefixed for long immediates:

- lru6 $\begin{array}{l} 1 1 1 1 \text{01}\ldots\ldots\ldots\ldots \\
0 1 1 0 \text{01}\ldots\ldots\ldots\ldots \text{M&R} \\
\end{array}$
**LDC**

Load a constant into a register

The instruction has two operands:

- $op1$ $D$ Any of r0...r11, cp, dp, sp, lr
- $op2$ $u_{16}$ A 16-bit immediate in the range 0...65535. If $u_{16} < 64$, the instruction requires no prefix

Mnemonic and operands:

```
LDC $D, u_{16}$
```

Operation:

```
D ← u_{16}
```

Encoding:

- $ru6$ 011010... M+R

or prefixed for long immediates:

- $lru6$ 011010... M&R
**LDD**

**Load double word**

Loads two words from memory, using a base and an index. The index is scaled in order to translate the double-word-index into a byte-index. The base address must be double-word-aligned. The immediate version, **LDDI**, implements a load from a structured data type; the version with registers only, **LDD**, implements a load from an array.

The instruction has four operands:

- \( op1 \) \( d \)  Operand register, one of \( r0...r11 \)
- \( op4 \) \( e \)  Operand register, one of \( r0...r11 \)
- \( op2 \) \( b \)  Operand register, one of \( r0...r11 \)
- \( op3 \) \( i \)  Operand register, one of \( r0...r11 \)

Mnemonic and operands:

\[
\text{LDD} \quad d, e, b, i
\]

Operation:

\[
\begin{align*}
  d & \leftarrow \text{mem} [ b + i \times Bpw \times 2 ] \\
  e & \leftarrow \text{mem} [ b + i \times Bpw \times 2 + Bpw ]
\end{align*}
\]

Encoding:

\[
\begin{array}{cccccccccccc}
1 & 1 & 1 & 1 & . & . & . & . & . & . & . & . \\
14r & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 0 & . & . & .
\end{array}
\]

M&R

Conditions that raise an exception:

- **ET_LOAD_STORE**  \( b \) is not double word aligned, or the indexed address does not point to a valid memory location.
LDDI

Load double word immediate

Loads two words from memory, using a base and an index. The index is scaled in order to translate the double-word-index into a byte-index. The base address must be double-word-aligned. The immediate version, LDDI, implements a load from a structured data type; the version with registers only, LDD, implements a load from an array.

The instruction has four operands:

- \( op_1 \) \( d \)  Operand register, one of \( r0 \ldots r11 \)
- \( op_4 \) \( e \)  Operand register, one of \( r0 \ldots r11 \)
- \( op_2 \) \( b \)  Operand register, one of \( r0 \ldots r11 \)
- \( op_3 \) \( i \)  An integer in the range \( 0 \ldots 11 \)

Mnemonic and operands:

\[
\text{LDDI} \quad d, e, b, i
\]

Operation:

\[
d \leftarrow \text{mem}\left[b + i \times Bpw \times 2\right]
\]

\[
e \leftarrow \text{mem}\left[b + i \times Bpw \times 2 + Bpw\right]
\]

Encoding:

\[
\begin{array}{cccccccc}
1 & 1 & 1 & 1 & . & . & . & . \\
0 & 0 & 1 & 0 & 0 & 1 & 1 & 1
\end{array}
\]

M&R

Conditions that raise an exception:

ET_LOAD_STORE  \( b \) is not double word aligned, or the indexed address does not point to a valid memory location.
LDDSP

Load double word from stack

Loads two words relative to the stack pointer. The stack pointer must be double-word aligned.

The instruction has three operands:

- \( op_1 \) \( d \)  Operand register, one of \( r0 \ldots r11 \)
- \( op_2 \) \( e \)  Operand register, one of \( r0 \ldots r11 \)
- \( op_3 \) \( u_s \)  An integer in the range \( 0 \ldots 11 \)

Mnemonic and operands:

\[
\text{LDDSP} \quad d, e, u_s
\]

Operation:

\[
d \leftarrow \text{mem}[sp + u_s \times Bpw \times 2]
\]
\[
e \leftarrow \text{mem}[sp + u_s \times Bpw \times 2 + Bpw]
\]

Encoding:

\[
\begin{array}{cccccc}
1 & 1 & 1 & 1 & . & . . . . . . . . . . . . \end{array}
\]

\[
\begin{array}{cccccccc}
1 & 1 & 1 & 0 & 1 & 1 & 1 & 1 & 0 & 1 & 0 & 0 \end{array}
\]

M&R

Conditions that raise an exception:

**ET_LOAD_STORE** \( sp \) is not double-word aligned, or the indexed address does not point to a valid memory location.
LDET

Load ET from the stack

Restores the value of ET from the stack from offset 4.

The value was typically saved using STET. Together with LDSPC, LDSSR, and LDSED all or part of the state can be restored.

The instruction has no operands.

Mnemonic and operands:

LDET

Operation:

\[ set = \text{mem}[sp + 4 \times Bpw] \]

Encoding:

\[
\begin{array}{cccccccccccc}
0 & 0 & 0 & 0 & 1 & 0 & 1 & 1 & 1 & 1 & 1 & 0 \\
\end{array}
\]

M

Conditions that raise an exception:

ET_LOAD_STORE  The indexed address does not point to a valid memory location.
LDIVU

Long unsigned divide

Divides a double word operand by a single word operand. This will result in a single word quotient and a single word remainder. This instruction has three source operands and two destination operands. The LDIVU instruction can take up to \( bpw \) thread-cycles to complete; the divide unit is shared between threads.

The operation only works if the division fits in a 32-bit word, that is, if the higher word of the double word input is less than the divisor. This operation is intended to be used for the implementation of long division.

The instruction has five operands:

- \( op1 \) \( d \) Operand register, one of \( r0...r11 \)
- \( op4 \) \( e \) Operand register, one of \( r0...r11 \)
- \( op2 \) \( x \) Operand register, one of \( r0...r11 \)
- \( op3 \) \( y \) Operand register, one of \( r0...r11 \)
- \( op5 \) \( v \) Operand register, one of \( r0...r11 \)

Mnemonic and operands:

\[
\text{LDIVU} \quad d, e, x, y, v
\]

Operation:

\[
d \leftarrow (v : x) \div y \\
e \leftarrow (v : x) \mod y
\]

Encoding:

\[
|5r \quad 1111 \cdots \cdots \cdots \cdots \cdots \cdots \\
|5r \quad 000000 \cdots \cdots \cdots \cdots \cdots \\
\]

M&R

Conditions that raise an exception:

\[
\text{ET_ARITHMETIC} \quad y = 0 \lor v \geq y.
\]
**LDSED**

Load SED from stack

Restores the value of SED from the stack from offset 3.

The value was typically saved using **STSED**. Together with **LDSPC**, **LDSSR**, and **LDET** all or part of the state can be restored.

The instruction has no operands.

Mnemonic and operands:

```
|        |        |        |        |        |
|        |        |        |        |        |
| 0 0 0 1 0 | 1 1 1 | 1 | 1 | 1 0 1 |
```

Conditions that raise an exception:

- **ET_LOAD_STORE** The indexed address does not point to a valid memory location.
LDSPC

Load the SPC from the stack

Restores the value of SPC from the stack from offset 1.

The value was typically saved using STSPC. Together with LDSED, LDSSR, and LDET all or part of the state can be restored.

The instruction has no operands.

Mnemonic and operands:

```
LDSPC
```

Operation:

```
spc ← mem[sp + 1 × Bpw]
```

Encoding:

```
0 0 0 0 1 1 1 1 | 1 0 1 1 0 0
```

Conditions that raise an exception:

- **ET_LOAD_STORE** The indexed address does not point to a valid memory location.
LDSSR

Load SSR from stack

Restores the value of SSR from the stack from offset 2.

The value was typically saved using STSSR. Together with LDSED, LDSPC, and LDET all or part of the state can be restored.

The instruction has no operands.

Mnemonic and operands:

LDSSR

Operation:

\[ ssr \leftarrow \text{mem}[sp + 2 \times Bpw] \]

Encoding:

0 0 0 0 1 1 1 1 1 1 0 1 1 0

Conditions that raise an exception:

ET_LOAD_STORE The indexed address does not point to a valid memory location.
LDW

Load word

Loads a word from memory, using two registers as a base register and an index register. The index register is scaled in order to translate the word-index into a byte-index. The base address must be word-aligned. The immediate version, LDWI, implements a load from a structured data type; the version with registers only, LDW, implements a load from an array.

The instruction has three operands:

\[
\begin{align*}
\text{op1} & \quad d \quad \text{Operand register, one of r0... r11} \\
\text{op2} & \quad b \quad \text{Operand register, one of r0... r11} \\
\text{op3} & \quad i \quad \text{Operand register, one of r0... r11}
\end{align*}
\]

Mnemonic and operands:

\[
\text{LDW} \quad d, b, i
\]

Operation:

\[
d \leftarrow \text{mem}[^{b + i \times Bpw}]\]

Encoding:

\[
\begin{array}{c|c}
3r & 01001\ldots\ldots1\ldots\ldots1 \quad M
\end{array}
\]

Conditions that raise an exception:

\[
\text{ET_LOAD_STORE} \quad b \text{ is not word aligned, or the indexed address does not point to a valid memory location.}
\]
LDWI

Load word immediate

Loads a word from memory, using two registers as a base register and an index register. The index register is scaled in order to translate the word-index into a byte-index. The base address must be word-aligned. The immediate version, LDWI, implements a load from a structured data type; the version with registers only, LDW, implements a load from an array.

The instruction has three operands:

- $op1\ d$ Operand register, one of $r0...r11$
- $op2\ b$ Operand register, one of $r0...r11$
- $op3\ i$ An integer in the range $0...11$

Mnemonic and operands:

$$LDWI\quad d, b, i$$

Operation:

$$d \leftarrow \text{mem}[b + i \times Bpw]$$

Encoding:

```
2rus  0 0 0 0 1 | · · · · · · · · · · M
```

Conditions that raise an exception:

**ET_LOAD_STORE** $b$ is not word aligned, or the indexed address does not point to a valid memory location.
LDWCP

Load word from constant pool

Loads a word relative to the constant pool pointer.

The instruction has two operands:

\[
\begin{align*}
    op_1 & \quad D & \text{Any of } r0 \ldots r11, \text{cp, dp, sp, lr} \\
    op_2 & \quad u_{16} & \text{A 16-bit immediate in the range 0...65535. If } u_{16} < 64, \\
          & & \text{the instruction requires no prefix}
\end{align*}
\]

Mnemonic and operands:

LDWCP \hspace{1em} D, u_{16}

Operation:

\[
D \leftarrow \text{mem}\left[\text{cp} + u_{16} \times Bpw\right]
\]

Encoding:

\[
\begin{array}{l}
    \text{ru6} & \quad 011011\ldots\ldots\ldots\ldots \quad \text{M} \\
\end{array}
\]

or prefixed for long immediates:

\[
\begin{array}{l}
    \text{lru6} & \quad 1111010\ldots\ldots\ldots\ldots \quad \text{M&R} \\
          & \quad 011011\ldots\ldots\ldots\ldots \quad \text{M&R}
\end{array}
\]

Conditions that raise an exception:

ET_LOAD_STORE \hspace{1em} \text{cp is not word aligned, or the indexed address does not point to a valid memory location.}
LDWCPL  

Load word from large constant pool

Loads a word relative to the constant pool pointer into $r11$. The offset can be larger than the offset specified in LDWCP.

The instruction has one operand:

$$op1 \quad u_{20}$$

A 20-bit immediate in the range $0...1048575$. If $u_{20} < 1024$, the instruction requires no prefix

Mnemonic and operands:

LDWCPL  $u_{20}$

Operation:

$$r11 \leftarrow mem[cp + u_{20} \times Bpw]$$

Encoding:

<table>
<thead>
<tr>
<th>$u10$</th>
<th>1 1 1 0 0</th>
<th>. . . . . . . . .</th>
<th>M</th>
</tr>
</thead>
</table>

or prefixed for long immediates:

<table>
<thead>
<tr>
<th>$lu10$</th>
<th>1 1 1 0 0</th>
<th>. . . . . . . . .</th>
<th>M&amp;R</th>
</tr>
</thead>
</table>

Conditions that raise an exception:

$ET\_LOAD\_STORE$ $cp$ is not word aligned, or the indexed address does not point to a valid memory location.
LDWDP

Load word form data pool

Loads a word relative to the data pointer.

The instruction has two operands:

\[
\begin{align*}
op_1 & \quad D & \text{Any of } r0...r11, \text{ cp, dp, sp, lr} \\
op_2 & \quad u_{16} & \text{A 16-bit immediate in the range } 0...65535. \text{ If } u_{16} < 64, \\
& & \text{the instruction requires no prefix}
\end{align*}
\]

Mnemonic and operands:

\[
\text{LDWDP} \quad D, u_{16}
\]

Operation:

\[
D \leftarrow \text{mem}[dp + u_{16} \times Bpw]
\]

Encoding:

\[
\begin{array}{cccccc}
\text{ru6} & \text{0 1 0 1 1 0} & \ldots & \ldots & \ldots & \ldots \\
\text{M}
\end{array}
\]

or prefixed for long immediates:

\[
\begin{array}{cccccc}
\text{lru6} & \text{1 1 1 1 0 0} & \ldots & \ldots & \ldots & \ldots \\
\text{0 1 0 1 1 0} & \ldots & \ldots & \ldots & \ldots & \ldots \\
\text{M&R}
\end{array}
\]

Conditions that raise an exception:

\textbf{ET_LOAD_STORE} \quad dp \text{ is not word aligned, or the indexed address does not point to a valid memory location.}
LDWSP  

Load word from stack

Loads a word relative to the stack pointer.

The instruction has two operands:

- **op1**: \( D \)  
  - Any of \( r0 \ldots r11, \text{cp}, \text{dp}, \text{sp}, \text{lr} \)
- **op2**: \( u_{16} \)  
  - A 16-bit immediate in the range 0...65535. If \( u_{16} < 64 \), the instruction requires no prefix

Mnemonic and operands:

\[
\text{LDWSP} \quad D, u_{16}
\]

Operation:

\[
D \leftarrow \text{mem}[\text{sp} + u_{16} \times \text{Bpw}]
\]

Encoding:

\[
\begin{array}{cccccccc}
\text{ru6} & 0 & 1 & 0 & 1 & 1 & \ldots & \ldots & \ldots & \ldots & \ldots & M
\end{array}
\]

or prefixed for long immediates:

\[
\begin{array}{cccccccc}
\text{lru6} & 1 & 1 & 1 & 1 & 0 & \ldots & \ldots & \ldots & \ldots & \ldots & M&R
\end{array}
\begin{array}{cccccccc}
\text{ru6} & 0 & 1 & 0 & 1 & 1 & \ldots & \ldots & \ldots & \ldots & \ldots & M
\end{array}
\]

Conditions that raise an exception:

- **ET_LOAD_STORE**: \( \text{sp} \) is not word aligned, or the indexed address does not point to a valid memory location.
LEXTRACT

Bitfield extraction from register pair

Extracts a bitfield at position $x$ in a pair of registers $l$ and $r$ into $d$. A mask $bitp$ is applied allowing a bitfield of less than $bpw$ bits to be extracted.

The instruction has five operands:

- $op1 \; d$ Operand register, one of $r0 \ldots r11$
- $op4 \; l$  Operand register, one of $r0 \ldots r11$
- $op2 \; r$  Operand register, one of $r0 \ldots r11$
- $op3 \; x$  Operand register, one of $r0 \ldots r11$
- $op5 \; bitp$ A bit position; one of $bpw$, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32

Mnemonic and operands:

LEXTRACT  $d, l, r, x, bitp$

Operation:

$$d \leftarrow ((l : r)[bit \; bpw + x - 1..x]) \land (2^{bitp} - 1);$$

Encoding:

<table>
<thead>
<tr>
<th>l4rus</th>
<th>0 0 0 1 1</th>
<th>×××××</th>
<th>0</th>
<th>...</th>
<th>M&amp;R</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>1 1 1 1</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Encoding:
LINSERT

 Inserts a bitfield into a pair of registers

Inserts a bitfield into a pair of registers \( d \) and \( e \). The bitfield is stored in register \( x \), the location of the bitfield is stored in register \( s \) (which must be between 0 and \( bpw - 1 \) inclusive), and the length of the bitfield is a short immediate operand \( bitp \).

The instruction has five operands:

- \( op1 \) \( d \) Operand register, one of \( r0...r11 \)
- \( op4 \) \( e \) Operand register, one of \( r0...r11 \)
- \( op2 \) \( x \) Operand register, one of \( r0...r11 \)
- \( op3 \) \( s \) Operand register, one of \( r0...r11 \)
- \( op5 \) \( bitp \) A bit position; one of \( bpw, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32 \)

Mnemonic and operands:

\[
\text{LINSERT} \quad d, e, x, s, bitp
\]

Operation:

\[
\begin{align*}
    m &= ((1 << bitp) - 1) << s \\
    d : e &= ((d : e) \land bit \neg m) \lor (x << s) \land bit m
\end{align*}
\]

Encoding:

```
1 1 1 1 | ... | ... | ... | ... \\
0 0 0 1 | x x x x x | 1 | ... | M&R
```

M&R
**LMUL**

Long multiply

Multiplies two words to produce a double-word, and adds two single words. Both the high word and the low word of the result are produced. This multiplication is unsigned and cannot overflow.

The instruction has six operands:

- $op_1 \quad d$  Operand register, one of $r0...r11$
- $op_4 \quad e$  Operand register, one of $r0...r11$
- $op_2 \quad x$  Operand register, one of $r0...r11$
- $op_3 \quad y$  Operand register, one of $r0...r11$
- $op_5 \quad v$  Operand register, one of $r0...r11$
- $op_6 \quad w$  Operand register, one of $r0...r11$

Mnemonic and operands:

```
LMUL    d, e, x, y, v, w
```

Operation:

```
e ← r[bpw-1...0]
d ← r[2bpw-1...bpw]
where r ← x \times y + v + w
```

Encoding:

```
<table>
<thead>
<tr>
<th>1 1 1 1</th>
<th>...</th>
<th>...</th>
<th>...</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 0 0 0</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
</tbody>
</table>
```

M&R
LSS

Less than signed

Tests whether one signed value is less than another signed value. The test result is produced in the destination register (c) as 1 (true) or 0 (false).

The instruction has three operands:

- \( op1 \ c \) Operand register, one of \( r0...r11 \)
- \( op2 \ x \) Operand register, one of \( r0...r11 \)
- \( op3 \ y \) Operand register, one of \( r0...r11 \)

Mnemonic and operands:

\[
LSS \quad c, x, y
\]

Operation:

\[
c \leftarrow \begin{cases} 
  x_{\text{signed}} < y_{\text{signed}}, & 1 \\
  x_{\text{signed}} \geq y_{\text{signed}}, & 0
\end{cases}
\]

Encoding:

\[
3r \quad 110000\cdots\cdots\cdots\cdots\cdots\cdots\quad M+R
\]
LSU

Less than unsigned

Tests whether one unsigned value is less than another unsigned value. The result is produced in the destination register (c) as 1 (true) or 0 (false). It can be used to perform efficient bound checks against values in the range $0...(y - 1)$

The instruction has three operands:

- $op1$ c: Operand register, one of r0...r11
- $op2$ x: Operand register, one of r0...r11
- $op3$ y: Operand register, one of r0...r11

Mnemonic and operands:

```
LSU c, x, y
```

Operation:

```
c ← \begin{cases} 
  x < y, & 1 \\
  x \geq y, & 0 
\end{cases}
```

Encoding:

```
3r 11001· · · · · · · · · · · M+R
```

---

---
LSUB

Long unsigned subtract

Subtracts unsigned integers and a borrow from an unsigned integer, producing both the unsigned result and the possible borrow. The instruction has five operands: two registers that contain the numbers to be subtracted (x and y), the borrow input which is stored in the last bit of a third source operand (v), one destination register which is used to store the borrow-out (e), and a destination register for the difference (d).

The instruction has five operands:

<table>
<thead>
<tr>
<th>op</th>
<th>d</th>
<th>Operand register, one of r0... r11</th>
</tr>
</thead>
<tbody>
<tr>
<td>op4</td>
<td>e</td>
<td>Operand register, one of r0... r11</td>
</tr>
<tr>
<td>op2</td>
<td>x</td>
<td>Operand register, one of r0... r11</td>
</tr>
<tr>
<td>op3</td>
<td>y</td>
<td>Operand register, one of r0... r11</td>
</tr>
<tr>
<td>op5</td>
<td>v</td>
<td>Operand register, one of r0... r11</td>
</tr>
</tbody>
</table>

Mnemonic and operands:

```
LSUB d, e, x, y, v
```

Operation:

\[
\begin{align*}
  d & \leftarrow r[bpw - 1...0] \\
  e & \leftarrow r[bpw] \\
  & \text{where } r = x - y - v[0]
\end{align*}
\]

Encoding:

```
   1 1 1 1
   |   |   |   |   |   |   |   |
I5r 0 0 0 0 1 0 0 0 0 0 0 0 0 0
```

M&R
MACCS

Multiply and accumulate signed

Multiplies two signed words, and adds the double word result into a signed double word accumulator. The double word accumulator comprises two registers that are used both as a source and destination. Two other operands are the values that are to be multiplied.

The instruction has four operands:

- \( op1 \ d \) Operand register, one of \( r0 \ldots r11 \)
- \( op4 \ e \) Operand register, one of \( r0 \ldots r11 \)
- \( op2 \ x \) Operand register, one of \( r0 \ldots r11 \)
- \( op3 \ y \) Operand register, one of \( r0 \ldots r11 \)

Mnemonic and operands:

\[
\text{MACCS} \quad d, e, x, y
\]

Operation:

\[
\begin{align*}
  e & \leftarrow \text{tmp}[\text{bpw} - 1 \ldots 0] \\
  d & \leftarrow \text{tmp}[2 \times \text{bpw} - 1 \ldots \text{bpw}]
\end{align*}
\]

where \( \text{tmp} \leftarrow (d_{\text{signed}} \cdot e) + x_{\text{signed}} \times y_{\text{signed}} \)

Encoding:

<table>
<thead>
<tr>
<th>1 1 1 1</th>
<th>...</th>
<th>...</th>
<th>...</th>
<th>...</th>
<th>...</th>
<th>...</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 0 0 0</td>
<td>1 1 1 1 1</td>
<td>0</td>
<td>...</td>
<td>M&amp;R</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
MACCU

Multiply and accumulate unsigned

Multiplies two unsigned words, and adds the double word result into an unsigned double word accumulator. The double word accumulator comprises two registers that are used both as a source and destination. Two other operands are the values that are to be multiplied.

MACCU can be used to correct word alignment issues by repeatedly operating on words of a stream. For example, multiplying with 0x00010000 will result in the high word of the accumulator to produce the same stream of words offset by half a word.

The instruction has four operands:

\[ \text{op1} \quad \text{d} \quad \text{Operand register, one of } r0...r11 \]
\[ \text{op4} \quad \text{e} \quad \text{Operand register, one of } r0...r11 \]
\[ \text{op2} \quad \text{x} \quad \text{Operand register, one of } r0...r11 \]
\[ \text{op3} \quad \text{y} \quad \text{Operand register, one of } r0...r11 \]

Mnemonic and operands:

MACCU \quad d, e, x, y

Operation:

\[ e \leftarrow \text{tmp}[bpw - 1...0] \]
\[ d \leftarrow \text{tmp}[2 \times bpw - 1...bpw] \]

where \( \text{tmp} \leftarrow (d : e) + x \times y \)

Encoding:

|4r | 1 1 1 1 | ... | ... | ... | ... | ... | ... | ... |
|   | 0 0 0 0 | 1 1 1 1 | 1 | 1 | 1 | 1 | ... | M&R
MJOIN

Synchronise and join

Synchronises the master thread that executes this instruction with all the slave threads associated with its synchroniser operand \((r)\), and frees those slave threads when the synchronisation completes. This is used to end a group of parallel threads. Note this clears the EEBLE bit. If the ININT bit is set, then MJOIN will not block; MJOIN should not be used inside an interrupt handler.

The slaves execute an SSYNC instruction to synchronise. The master can execute an MSYNC instruction to synchronise without freeing the slave threads.

The instruction has one operand:

\[ op1 \quad r \]

Operand register, one of \(r0...r11\)

Mnemonic and operands:

\[
\text{MJOIN} \quad r
\]

Operation:

\[
sr[\text{eeble}] \leftarrow 0
\]

if \((\text{slaves}_r \setminus \text{spausd} = \emptyset)\)

then

\[
\text{forall thread} \in \text{slaves}_r : \text{inuse}_{\text{thread}} \leftarrow 0
\]

\[
\text{mjoin}_{\text{syn}(\text{tid})} \leftarrow 0
\]

else

\[
\text{mpausd} \leftarrow \text{mpausd} \cup \{\text{tid}\}
\]

\[
\text{mjoin}_r \leftarrow 1
\]

\[
\text{msyn}_r \leftarrow 1
\]

Encoding:

\[
1r \quad 00010111111111...\quad R
\]

Conditions that raise an exception:

\(\text{ET\_RESOURCE\_DEP}\)

Resource illegally shared between threads

\(\text{ET\_ILLEGAL\_RESOURCE}\)

\(r\) is not a synchroniser resource, or the resource is not in use.
MKMSK

Make n-bit mask

Makes an n-bit mask that can be used to extract a bit field from a word. The resulting mask consists of $s_1$ bits aligned to the right.

The instruction has two operands:

- $op_1$ $d$  Operand register, one of $r_0$...$r_{11}$
- $op_2$ $s$  Operand register, one of $r_0$...$r_{11}$

Mnemonic and operands:

\[
\text{MKMSK} \quad d, s
\]

Operation:

\[
d \left\{ \begin{array}{l}
\text{if } s < \text{bpw,}
\quad 2^s - 1 \\
\text{if } s \geq \text{bpw,}
\quad 1 : 1 : \ldots : 1 
\end{array} \right.
\]

Encoding:

\[
2r \quad 1 0 1 0 0 \ldots \cdot 0 \cdot \ldots \quad \text{M+R}
\]
MKMSKI

Make n-bit mask immediate

Makes an n-bit mask that can be used to extract a bit field from a word. The resulting mask consists of bit $p_1$ bits aligned to the right.

The instruction has two operands:

\[ \text{op1} \quad d \quad \text{Operand register, one of r0...r11} \]
\[ \text{op2} \quad \text{bitp} \quad \text{A bit position; one of bpw, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32} \]

Mnemonic and operands:

\[
\text{MKMSKI} \quad d, \text{bitp}
\]

Operation:

\[
d \left\{ \begin{array}{l}
\text{bitp < bpw,} \quad 2^{\text{bitp}} - 1 \\
\text{bitp \geq bpw,} \quad 1 : 1 : ... : 1
\end{array} \right.
\]

Encoding:

\[
\text{rus} \quad 10100\cdots\cdots\cdots1\cdots\cdots M+R
\]
MSYNC

Master synchronise

Synchronise a master thread with the slave threads associated with its synchroniser \((r)\). If the slave threads have just been created (with GETST), then MSYNC starts all slaves. This clears the EEBLE bit. If the ININT bit is set, then MSYNC will not block; MSYNC should not be used inside an interrupt handler.

The slaves execute an SSYNC instruction to synchronise. The master can execute an MJOIN instruction to free the slave threads after synchronisation.

The instruction has one operand:

\[
\text{op} \quad r \quad \text{Operand register, one of } r0...r11
\]

Mnemonic and operands:

\[
\text{MSYNC} \quad r
\]

Operation:

\[
sr[\text{eeble}] \leftarrow 0 \\
\text{if } (\text{slaves}_{r} \setminus \text{spaus}ed = \emptyset) \text{ then} \\
\text{spaus}ed \leftarrow \text{spaus}ed \setminus \text{slaves}_{r} \\
\text{else} \\
\text{mpaus}ed \leftarrow \text{mpaus}ed \cup \{\text{tid}\} \\
msyn_{r} \leftarrow 1
\]

Encoding:

\[
1r \quad 0 \quad 0 \quad 0 \quad 1 \quad 1 \quad 1 \quad 1 \quad 1 \quad \ldots \quad \text{R}
\]

Conditions that raise an exception:

- **ET_RESOURCE_DEP** Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE** \( r \) is not a synchroniser resource, or the resource is not in use.
- **ET_ILLEGAL_PC** One or more of the slave threads do not have a legal program counter.
MUL

Unsigned multiply

Performs a single word unsigned multiply. Any overflow is discarded, and only the last $bpw$ bits of the result are produced.

If overflow is important, one of the LMUL, MACCU or MACCS instructions should be used.

The instruction has three operands:

\[
\begin{align*}
op_1 & \quad d & \text{Operand register, one of } r0 \ldots r11 \\
op_2 & \quad x & \text{Operand register, one of } r0 \ldots r11 \\
op_3 & \quad y & \text{Operand register, one of } r0 \ldots r11
\end{align*}
\]

Mnemonic and operands:

\[
\text{MUL } d, x, y
\]

Operation:

\[
d \leftarrow (x \times y) \mod 2^{bpw}
\]

Encoding:

\[
\begin{array}{cccccccccccc}
1 & 1 & 1 & 1 & . & . & . & . & . & . & . & . & . \\
0 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 0 & 1 & 1 & 0
\end{array}
\]

M&R
NEG

Two’s complement negate

Performs a signed negation in two’s complement, ie, it computes $0 - s$. Overflow is ignored, ie, Negating $-2^{bpw-1}$ will produce $-2^{bpw-1}$.

The instruction has two operands:

$op1 \quad d$ Operand register, one of r0... r11

$op2 \quad s$ Operand register, one of r0... r11

Mnemonic and operands:

\[
\text{NEG} \quad d, s
\]

Operation:

$\quad d_{signed} \quad \leftarrow \quad 2^{bpw} - s$

Encoding:

\[
2r \quad 1 \ 0 \ 0 \ 1 \ 0 | \cdot \cdot \cdot | 0 | \cdot \cdot \cdot \quad \text{M+R}
\]
NOP

No operation.

The instruction has no operands.

Mnemonic and operands:

NOP

Operation:

No operation

Encoding:

0 0 0 0 | 0 1 1 1 | 1 1 1 1 | M+R
NOT Bitwise not

Produces the bitwise not of its source operand.

The instruction has two operands:

\[ op1 \quad d \quad \text{Operand register, one of } r0...\ r11 \]
\[ op2 \quad s \quad \text{Operand register, one of } r0...\ r11 \]

Mnemonic and operands:

\[ \text{NOT} \quad d, s \]

Operation:

\[ d \leftarrow \neg_{\text{bit}} s; \]

Encoding:

\[ 2r \quad 1\ 0\ 0\ 0\ 1\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \cdot\ \&
OR

Bitwise or

Produces the bitwise or of its two source operands.

The instruction has three operands:

\[ \begin{align*}
  op1 & \quad d & \quad \text{Operand register, one of } r0...r11 \\
  op2 & \quad x & \quad \text{Operand register, one of } r0...r11 \\
  op3 & \quad y & \quad \text{Operand register, one of } r0...r11
\end{align*} \]

Mnemonic and operands:

\[
\text{OR} \quad d, x, y
\]

Operation:

\[
d \leftarrow x \lor \text{bit } y
\]

Encoding:

\[
3r \quad \begin{array}{c}
010000...\end{array} \quad \begin{array}{c}
\ldots\ldots\end{array} \quad \text{M+R}
\]
OUT

Output data to a resource. The precise effect of this instruction depends on the resource:

**Port**
Output a word to the port - if the port is buffered the data will be shifted out piece-meal, if the port is unbuffered the most significant bits of the data outputted will be ignored. The instruction pauses if the out data cannot be accepted.

**Channel end**
Output $Bpw$ data tokens to the destination associated with this channel-end (see SETD) - the most significant byte of the word is output first. The instruction pauses if the out data cannot be accepted.

**Lock**
Releases the lock.

The instruction has two operands:

$$\begin{align*}
op1 & \quad r \\
op2 & \quad s
\end{align*}$$

Operands:

- $r$: Operand register, one of $r0...r11$
- $s$: Operand register, one of $r0...r11$

Mnemonic and operands:

```
OUT r, s
```

Operation:

```
r \triangleleft s
```

Encoding:

```
1 0 1 0 | 0 0 0 0 | 0 0 0 0 0 0 0 0 0 0 0 0
```

Conditions that raise an exception:

- **ET_RESOURCE_DEP**: Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE**: $r$ is not a valid resource, not in use, or it does not support OUT.
- **ET_LINK_ERROR**: $r$ is a channel end, and the destination has not been set.
OUTCT

Outputs a control token to a channel.

The instruction pauses if the control token cannot be accepted by the channel.

Each OUTCT must have a matching CHKCT or INCT

The instruction has two operands:

\[\begin{align*}
  op_1 & \quad r & \text{Operand register, one of } r0 \ldots r11 \\
  op_2 & \quad s & \text{Operand register, one of } r0 \ldots r11
\end{align*}\]

Mnemonic and operands:

\[\text{OUTCT} \quad r, s\]

Operation:

\[r \leftarrow \text{ctoken}(s)\]

Encoding:

\[2r \quad 01001\cdots\cdots\cdot0\cdots\cdot\quad R\]

Conditions that raise an exception:

- ET_RESOURCE_DEP: Resource illegally shared between threads
- ET_ILLEGAL_RESOURCE: \(r\) is not a channel end, or not in use.
- ET_LINK_ERROR: \(r\) is a channel end, and the destination has not been set.
- ET_LINK_ERROR: \(r\) is a channel end, and the control token is a reserved hardware token.
OUTCTI  

Output a control token immediate

Outputs a control token to a channel.

The instruction pauses if the control token cannot be accepted by the channel.

Each OUTCTT must have a matching CHKCT or INCT

The instruction has two operands:

\[ \begin{align*}
  \text{op1} & \quad r & \text{Operand register, one of } r0...r11 \\
  \text{op2} & \quad u_s & \text{An integer in the range } 0...11 \\
\end{align*} \]

Mnemonic and operands:

\[
\text{OUTCTI} \quad r, u_s
\]

Operation:

\[ r \leftarrow \text{ctoken}(u_s) \]

Encoding:

\[
\text{rus} \quad 01001\cdots\cdots1\cdots \\
\]

Conditions that raise an exception:

- **ET_RESOURCE_DEP**: Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE**: \( r \) is not a channel end, or not in use.
- **ET_LINK_ERROR**: \( r \) is a channel end, and the destination has not been set.
- **ET_LINK_ERROR**: \( r \) is a channel end, and the control token is a reserved hardware token.
OUTPW

Output a part word

Outputs a partial word to a port. This is useful to send the last few port-widths of data.

The instruction pauses if the out data cannot be accepted.

The instruction has three operands:

- \( op_1 \ s \)  Operand register, one of \( r0...r11 \)
- \( op_2 \ r \)  Operand register, one of \( r0...r11 \)
- \( op_3 \ w \)  Operand register, one of \( r0...r11 \)

Mnemonic and operands:

\[
\text{OUTPW} \quad s,r,w
\]

Operation:

\[
\begin{align*}
\text{shiftcount}_r & \leftarrow w \\
\text{\( r \) \& s}
\end{align*}
\]

Encoding:

\[
\begin{array}{c}
\text{1111} \ldots \ldots \ldots \\
1 \text{l3r}110011111101101
\end{array}
\]

Conditions that raise an exception:

- **ET_RESOURCE_DEP**  Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE**  \( r \) is not pointing to a port resource, or the resource is not in use, or \( w \) is an unsupported width, or the port is not in BUFFERS mode.
OUTPWI

Output a part word immediate

Outputs a partial word to a port. This is useful to send the last few port-widths of data.

The instruction pauses if the out data cannot be accepted.

The instruction has three operands:

- \( op1 \) \( s \)  Operand register, one of \( r0 \ldots r11 \)
- \( op2 \) \( r \)  Operand register, one of \( r0 \ldots r11 \)
- \( op3 \) \( bitp \)  A bit position; one of \( bpw, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32 \)

Mnemonic and operands:

\[ \text{OUTPWI} \quad s, r, bitp \]

Operation:

\[ \text{shiftcount}_r \leftarrow bitp \]
\[ r \ll s \]

Encoding:

|M&R| 1 1 1 1 | 1 1 1 1 | 0 1 0 1 |
---|---|---|---|
| 0 0 0 1 0 1 1 1 1 | 0 0 0 1 0 1 1 1 1 |

Conditions that raise an exception:

- \( \text{ET\_RESOURCE\_DEP} \)  Resource illegally shared between threads
- \( \text{ET\_ILLEGAL\_RESOURCE} \)  \( r \) is not pointing to a port resource, or the resource is not in use, or \( bitp \) is an unsupported width, or the port is not in BUFFERS mode.
OUTSHR

Output data and shift

Outputs the least significant \textit{port-width} bits of a register to a port, shifting the register contents to the right by that number of bits.

The instruction pauses if the out data cannot be accepted.

The instruction has two operands:

\begin{align*}
op_1 & \ r & \text{Operand register, one of } r0 \ldots r11 \\
op_2 & \ d & \text{Operand register, one of } r0 \ldots r11
\end{align*}

Mnemonic and operands:

\begin{align*}
\text{OUTSHR} & \quad r, d
\end{align*}

Operation:

\begin{align*}
\begin{alignat*}{2}
r & \quad & \triangleleft & \quad d[\text{portwidth}_r - 1 \ldots 0] \\
d & \quad & \leftarrow & \quad 0 : \ldots : 0 : d[\text{bpw} - 1 \ldots \text{portwidth}_r]
\end{alignat*}
\end{align*}

Encoding:

\begin{align*}
r2r & \quad \begin{array}{cccccccc}
1 & 0 & 1 & 0 & 1 & \cdots & \cdots & \cdots
\end{array} & \quad R
\end{align*}

Conditions that raise an exception:

\begin{itemize}
\item \texttt{ET\_RESOURCE\_DEP} \quad \text{Resource illegally shared between threads}
\item \texttt{ET\_ILLEGAL\_RESOURCE} \quad r \text{ is not pointing to a port resource, or the resource is not in use.}
\end{itemize}
OUTT

Output a data token to a channel.

The instruction pauses if the output token cannot be accepted.

The instruction has two operands:

- \( op1 \ r \) Operand register, one of \( r0 \ldots r11 \)
- \( op2 \ s \) Operand register, one of \( r0 \ldots r11 \)

Mnemonic and operands:

\[
\text{OUTT} \quad r, s
\]

Operation:

\[
r \leftarrow \text{dtoken}(s)
\]

Encoding:

\[
\begin{array}{cccccccccc}
\text{r2r} & 0 & 0 & 0 & 0 & 1 & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot
\end{array}
\]

Conditions that raise an exception:

- \textbf{ET\_RESOURCE\_DEP} Resource illegally shared between threads
- \textbf{ET\_ILLEGAL\_RESOURCE} \( r \) is not a channel end or not in use.
- \textbf{ET\_LINK\_ERROR} \( r \) is a channel end, and the destination has not been set.
PEEK  

Looks at the value of the port pins, by-passing all input logic. Peek will not pause, and will not take ownership of the port.

The instruction has two operands:

\[
\begin{align*}
\text{op1} & \quad d \quad \text{Operand register, one of } r0...r11 \\
\text{op2} & \quad r \quad \text{Operand register, one of } r0...r11
\end{align*}
\]

Mnemonic and operands:

\[
\text{PEEK} \quad d, r
\]

Operation:

\[
d \leftarrow \text{pins}(r)
\]

Encoding:

\[
2r \quad 10111\cdots\cdots0\cdots\cdots \quad \text{R}
\]

Conditions that raise an exception:

\text{ET_ILLEGAL_RESOURCE} \quad r \text{ is not a port resource, or the resource is not in use.}
REMS

Computes a signed integer remainder. The remainder is negative if the dividend is negative. For example 5 rem 3 is 2, -5 rem 3 is -2, -5 rem -3 is -2, and 5 rem -3 is 2.

This instruction does not execute in a single cycle, and multiple threads may share the same division unit. The remainder may take up to \( bpw \) thread-cycles.

The instruction has three operands:

- \( op1 \) \( d \) Operand register, one of \( r0...r11 \)
- \( op2 \) \( x \) Operand register, one of \( r0...r11 \)
- \( op3 \) \( y \) Operand register, one of \( r0...r11 \)

Mnemonic and operands:

\[ \text{REMS} \quad d, x, y \]

Operation:

\[ d_{\text{signed}} \leftarrow x_{\text{signed}} \text{ mod } y_{\text{signed}} \]

Encoding:

<table>
<thead>
<tr>
<th>L3r</th>
<th>M&amp;R</th>
</tr>
</thead>
<tbody>
<tr>
<td>1111</td>
<td>1100</td>
</tr>
</tbody>
</table>

Conditions that raise an exception:

- \( \text{ET_ARITHMETIC} \) Remainder by \( x \) by 0.
- \( \text{ET_ARITHMETIC} \) Remainder by of \( -2^{bpw-1} \) by \(-1\)
REMU

Unsigned remainder

Computes an unsigned integer remainder.

This instruction does not execute in a single cycle, and multiple threads may share
the same division unit. The division may take up to $bpw$ thread-cycles.

The instruction has three operands:

- $op1$ $d$ Operand register, one of $r0...r11$
- $op2$ $x$ Operand register, one of $r0...r11$
- $op3$ $y$ Operand register, one of $r0...r11$

Mnemonic and operands:

```
REMU d, x, y
```

Operation:

```
d ← x mod y
```

Encoding:

```
l3r  1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 0
```

Conditions that raise an exception:

- ET_ARITHMETIC Remainder of $x$ by 0.
RETSP

Returns to the caller of this procedure, and (optionally) adjusts the stack. This instruction assumes that the return address is stored in LR (where call instructions leave the return address).

This instruction is used with ENTSP. The BLA, BLACP, BLAT, BLRB and BLRF instructions perform the opposite of this instruction, calling a procedure.

The instruction has one operand:

\[ op1 \quad u_{16} \quad \text{A 16-bit immediate in the range } 0...65535. \text{ If } u_{16} < 64, \text{ the instruction requires no prefix} \]

Mnemonic and operands:

\[ \text{RETSP} \quad u_{16} \]

Operation:

\[
\text{if } u_{16} > 0 \text{ then}
sp \leftarrow sp + u_6 \times Bpw
lr \leftarrow \text{mem}[sp]
pc \leftarrow lr
sr[di] \leftarrow lr \land \text{bit} 1
\]

Encoding:

\[
\text{u6 } 0 1 1 1 0 1 1 1 1 1 1 \ldots \ldots \ldots \quad \text{M}
\]

or prefixed for long immediates:

\[
\text{lu6 } 1 1 1 1 0 0 \ldots \ldots \ldots \ldots \ldots \quad \text{M&R}
\]

Conditions that raise an exception:

\text{ET_LOAD_STORE} \quad \text{Register } sp \text{ points to an unaligned address, or the indexed address does not point to a valid memory address.}
**LSATS**

Saturate signed

Perform saturation on a double word value. Given a bit index this operation will check if any arithmetic has overflowed beyond this bit. If an overflow has occurred, then the double word will be set to MININT or MAXINT (shifted by the given bit location). Performing this instruction between a series of MACCS instructions and a LEXTRACT instruction will cause the extracted word to be either the correct answer or MAXINT/MININT if the result had overflowed positively or negatively.

The instruction has three operands:

- \(op_1\) \(d\) Operand register, one of \(r0\) ... \(r11\)
- \(op_2\) \(x\) Operand register, one of \(r0\) ... \(r11\)
- \(op_3\) \(y\) Operand register, one of \(r0\) ... \(r11\)

Mnemonic and operands:

\[
\text{LSATS} \quad d, x, y
\]

Operation:

\[
\begin{align*}
\text{if} \quad d : x & > 2^{y+bpw} - 1 \\
\text{then} \quad d : x & ← 2^{y+bpw} - 1 \\
\text{elsif} \quad d : x & < -2^{y+bpw} \\
\text{then} \quad d : x & ← -2^{y+bpw}
\end{align*}
\]

Encoding:

\[
\begin{array}{cccccc}
1 & 1 & 1 & 1 & 1 & \ldots \\
0 & 1 & 1 & 1 & 1 & 0 & 1 & 1 & 0 & 0
\end{array}
\]
SETC

Set resource control bits

Sets the resource control bits. The control bits that can be set with SETC are the following:

- CTRL_INUSE_OFF 0x0000
- CTRL_INUSE_ON 0x0008
- CTRLCOND_NONE 0x0001
- CTRLCOND_FULL 0x0001
- CTRLCOND_AFT 0x0009
- CTRLCOND_EQ 0x0111
- CTRLCOND_NEQ 0x0019
- CTRLCOND_GREATER 0x0021
- CTRLCOND_LESS 0x0029
- CTRL_IE_MODE_EVENT 0x0002
- CTRL_IE_MODE_INTERRUPT 0x000a
- CTRL_DRIVE_DRIVE 0x0003
- CTRL_DRIVE_PULL_DOWN 0x000b
- CTRL_DRIVE_PULL_UP 0x0013
- CTRL_RUN_STOPR 0x0007
- CTRL_RUN_STARTR 0x000f
- CTRL_RUN_CLRBUF 0x0017
- CTRL_MS_MASTER 0x1007
- CTRL_BUF_NOBUFFERS 0x2007
- CTRL_BUF_BUFFERS 0x200f
- CTRL_RDY_NOREADY 0x3007
- CTRL_RDY_STROBED 0x300f
- CTRL_RDY_HANDSHAKE 0x3017
- CTRL_SDELAY_NOSDELAY 0x4007
- CTRL_SDELAY_SDELAY 0x400f
- CTRL_INV_NOINVERT 0x6007
- CTRL_INV_INVERT 0x600f
- CTRL_PORT_DATAPORT 0x5007
- CTRL_PORT_CLOCKPORT 0x500f
- CTRL_PORT_READYPORT 0x5017
- CTRL_INV_NOINVERT 0x6007
- CTRL_INV_INVERT 0x600f

The precise effect depends on the resource type:

**Port**

See the chapter on Ports in the architecture manual for a description of the port modes.

**Timer**

Only two of the modes, COND_AFTER and COND_NONE, can be used. When COND_AFTER is set, the next IN operation on this resource will block until the timer has reached the value set with SETD. Note that any value between the set time and the set time - 2^{bpw-1} is accepted for the after condition.

**Clock source**

Only the modes INUSE_ON and INUSE_OFF can be used - the resource must be switched on before it is used, and switch off when the program is finished with it.

The instruction has two operands:

- op1 r Operand register, one of r0... r11
- op2 s Operand register, one of r0... r11

Mnemonic and operands:
SETC \( r, s \)

Operation:

\[ control_r \leftarrow s \]

Encoding:

<p>| | | | | | | | | | | | | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>M &amp; R</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Conditions that raise an exception:

- **ET_RESOURCE_DEP**: Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE**: \( r \) is not a valid resource, or the resource is not in use, or not a resource on which SETC can be used
- **ET_ILLEGAL_RESOURCE**: \( s \) is not a valid mode, or not a mode that can be used on \( r \).
SETCI

Set resource control bits immediate

Sets the resource control bits. The control bits that can be set with SETC are the following:

```
CTRL_INUSE_OFF   0x0000
CTRL_INUSE_ON    0x0008
CTRL_COND_NONE   0x0001
CTRL_COND_FULL   0x0001
CTRL_COND_AFTER  0x0009
CTRL_COND_EQ     0x0011
CTRL_COND_NEQ    0x0019
CTRL_COND_GREATER 0x0021
CTRL_COND_LESS   0x0029
CTRL_IE_MODE_EVENT 0x000a
CTRL_IE_MODE_INTERRUPT 0x000a
CTRL_DRIVE_DRIVE  0x0003
CTRL_DRIVE_PULL_DOWN 0x000b
CTRL_DRIVE_PULL_UP  0x013
CTRL_RUN_STOPR    0x0007
CTRL_RUN_STARTR   0x000f
CTRL_RUN_CLRBUFF  0x0017
CTRL_MS_MASTER    0x1007
CTRL_MS_SLAVE     0x100f
CTRL_BUF_NOBUFFERS 0x2007
CTRL_BUF_BUFFERS  0x200f
CTRL_RDY_NOREADY  0x3007
CTRL_RDY_STROBED  0x300f
CTRL_RDY_HANDSHAKE 0x3017
CTRL_SDELAY_NOSDELAY 0x4007
CTRL_SDELAY_SDELAY 0x400f
CTRL_PORT_DATAPORT 0x5007
CTRL_PORT_CLOCKPORT 0x500f
CTRL_PORT_READYPORT 0x5017
CTRL_INV_NOINVERT  0x6007
CTRL_INV_INVERT   0x600f
```

The precise effect depends on the resource type:

**Port**

See the chapter on Ports in the architecture manual for a description of the port modes.

**Timer**

Only two of the modes, COND_AFTER and COND_NONE, can be used. When COND_AFTER is set, the next IN operation on this resource will block until the timer has reached the value set with SETD. Note that any value between the set time and the set time - $2^{bpw-1}$ is accepted for the after condition.

**Clock source**

Only the modes INUSE_ON and INUSE_OFF can be used - the resource must be switched on before it is used, and switch off when the program is finished with it.

The instruction has two operands:

- $op_1 \ r$  Operand register, one of $r0...r11$
- $op_2 \ u_{16}$  A 16-bit immediate in the range 0...65535. If $u_{16} < 64$, the instruction requires no prefix

Mnemonic and operands:
SETCI \( r, u_{16} \)

Operation:
\[ control_r \leftarrow u_{16} \]

Encoding:
\[
\begin{array}{c}
\text{ru6} \\
1 1 1 0 1 0 . . . . . . \\
\end{array}
\]

or prefixed for long immediates:
\[
\begin{array}{c}
\text{lru6} \\
1 1 1 1 0 1 0 . . . . . . \\
\end{array}
\]

Conditions that raise an exception:
- \text{ET\_RESOURCE\_DEP} Resource illegally shared between threads
- \text{ET\_ILLEGAL\_RESOURCE} \( op1 \) is not a valid resource, or the resource is not in use, or not a resource on which SETC can be used
- \text{ET\_ILLEGAL\_RESOURCE} \( op2 \) is not a valid mode, or not a mode that can be used on \( op1 \).
SETCLK

Set clock for a resource

Sets the clock for a resource. The precise meaning of this instruction depends on the resource.

The instruction has two operands:

\[
\begin{align*}
op1 & \quad r & \text{Operand register, one of } r0...r11 \\
op2 & \quad s & \text{Operand register, one of } r0...r11 \\
\end{align*}
\]

Mnemonic and operands:

\[
\text{SETCLK} \quad r, s
\]

Operation:

\[
\text{clk}_r \leftarrow s
\]

Encoding:

\[
\begin{array}{cccccccccccc}
1 & 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 0 & 0 & M&R
\end{array}
\]

Conditions that raise an exception:

- **ET_RESOURCE_DEP** Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE** \( r \) is not a port or clock source resource, or the resource is not in use.
- **ET_ILLEGAL_RESOURCE** \( s \) is not a port or clock source resource.
- **ET_ILLEGAL_RESOURCE** \( r \) is a running clock-block.
SETCP

Set constant pool

Sets the base address of the constant pool, held in \( cp \). The value that is written into \( cp \) should be word-aligned, otherwise subsequent loads and stores relative to \( cp \) will raise an exception.

SETCP is used in conjunction with LDWCP and LDAWCP.

The instruction has one operand:

\[
\text{op1 } \quad s \quad \text{Operand register, one of } r0...r11
\]

Mnemonic and operands:

\[
\text{SETCP} \quad s
\]

Operation:

\[
cp \leftarrow s
\]

Encoding:

\[
1r \quad 0\ 0\ 1\ 1\ 0|1\ 1\ 1\ 1|1|1|\ldots\quad M
\]
**SETD**

Set event data

Sets the contents of the data/dest/divide register of a resource. Its data register is read using \texttt{GETD}. The way that a resource depends on the data register is resource dependent:

**Port**
- specifies the value for the input condition (see \texttt{SETC})

**Timer**
- specifies the value to wait for (see \texttt{SETC})

**Channel end**
- specifies the destination channel for \texttt{OUT} operations. The value written should be a channel identifier, constructed as specified for \texttt{GETR}.

**Clock source**
- specifies the value to divide the clock input by.

The instruction has two operands:

- \texttt{op1 \ r}Operand register, one of \texttt{r0...r11}
- \texttt{op2 \ s}Operand register, one of \texttt{r0...r11}

Mnemonic and operands:

\begin{verbatim}
SETD \ r, s
\end{verbatim}

Operation:

\begin{equation}
data_r \leftarrow s
\end{equation}

Encoding:

\begin{verbatim}
r2r 0 0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 R
\end{verbatim}

Conditions that raise an exception:

- \texttt{ET\_RESOURCE\_DEP} Resource illegally shared between threads
- \texttt{ET\_ILLEGAL\_RESOURCE} \texttt{r} is not a channel, timer, port or clock resource, or the resource is not in use.
- \texttt{ET\_ILLEGAL\_RESOURCE} \texttt{r} is a running clock-block.
- \texttt{ET\_ILLEGAL\_RESOURCE} \texttt{r} is a channel-end, and \texttt{s} is not a channel-end or a configuration resource.
SETDP

Sets the base address of the global data area, held in \( dp \). The value that is written into \( dp \) should be word-aligned, otherwise subsequent loads and stores relative to \( dp \) will raise an exception.

SETDP is used in conjunction with LDWDP, STWDP, and LDAWDP.

The instruction has one operand:

\[
op 1 \quad s \quad \text{Operand register, one of r0...r11}
\]

Mnemonic and operands:

\[
\begin{align*}
\text{SETDP} & \quad s \\
\end{align*}
\]

Operation:

\[
dp \leftarrow s
\]

Encoding:

\[
1r \quad 0011011111110 \cdots \quad M
\]
SETEV

Set environment vector

Sets the environment vector related to a resource. When a resource issues an event to a thread, any address stored in the environment vector will overwrite $ed$. If uninitialised, $ed$ will be set to the resource identifier. SETEV can be used to pass an address specific to a resource to the event handler. SETEV can be used to share a single handler between multiple resources. Note that SETEV is intended to pass address information, as such it does not necessarily hold $bpw$ bits.

SETEV is used in conjunction with SETV, and any of the WAITEU instructions.

The instruction has one operand:

\[ opl \ r \quad \text{Operand register, one of } r0...r11 \]

Mnemonic and operands:

\[
\text{SETEV} \quad r
\]

Operation:

\[ ev_r \leftarrow r11 \]

Encoding:

\[
1r \quad 0 0 1 1 | 1 | 1 | 1 | 1 | 1 | 1 | \ldots R
\]

Conditions that raise an exception:

- **ET_RESOURCE_DEP** Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE** $r$ is not a port, timer or channel resource, or the resource is not in use.
**SETKEP**

Sets the kernel entry point. The kernel entry point should be aligned on a 128-byte boundary.

The instruction has no operands.

Mnemonic and operands:

```plaintext
SETKEP
```

Operation:

```
kep ← r11
```

Encoding:

```
0 0 0 0 0 | 1 1 1 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1
```
**SETN**

Sets the logical network over which a channel should communicate.

The instruction has two operands:

\[
\begin{align*}
op_1 & \quad r & \text{Operand register, one of } r0...r11 \\
op_2 & \quad s & \text{Operand register, one of } r0...r11
\end{align*}
\]

Mnemonic and operands:

\[
\text{SETN} \quad r, s
\]

Operation:

\[
\text{net}_r \leftarrow s
\]

Encoding:

\[
\begin{array}{c}
\text{Ir2r} \\
0 \ 0 \ 1 \ 1 \ 0 \ 1 \ 1 \ 1 \ 1 \ 0 \ 1 \ 1 \ 0
\end{array}
\]

M&R

Conditions that raise an exception:

- **ET_RESOURCE_DEP** Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE** \( r \) is not a channel end or not in use.
SETPS

Set processor state

Sets a processor internal register. Only used when configuring the core.

The instruction has two operands:

\[ \begin{align*}
  op1 & \quad r \quad \text{Operand register, one of } r0...r11 \\
  op2 & \quad s \quad \text{Operand register, one of } r0...r11
\end{align*} \]

Mnemonic and operands:

\[ \text{SETPS} \quad r, s \]

Operation:

\[ ps[r] \leftarrow s \]

Encoding:

\[ \begin{array}{cccccccccc}
  0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 0 & 1 & 0 & M&R \\
\end{array} \]

Conditions that raise an exception:

- `ET_ILLEGAL_PS s` is not referring to a legal processor state register
- `ET_ILLEGAL_PS s` is not referring to a read-only processor state register
- `ET_ILLEGAL_PS s` is referring to RAMBASE and \( r \) is set to the ROM address
SETPSC

Set the port shift count

Sets the port shift count for input and output operations.

OUTPW and INPW can be used instead of a combination of SETPSC and OUT/IN.

The instruction has two operands:

\[ \begin{align*}
  op1 & \ r & \text{Operand register, one of } r0\ldots r11 \\
  op2 & \ s & \text{Operand register, one of } r0\ldots r11 
\end{align*} \]

Mnemonic and operands:

\[
\text{SETPSC} \quad r, s
\]

Operation:

\[ \text{shiftcount}_r \leftarrow s \]

Encoding:

\[
\begin{array}{cccccccccc}
  r2r & 1 & 1 & 0 & 0 & 0 & \cdots & \cdots & \cdots & 0 & \cdots & \cdots & R
\end{array}
\]

Conditions that raise an exception:

- **ET\_RESOURCE\_DEP**: Resource illegally shared between threads
- **ET\_ILLEGAL\_RESOURCE**: \( r \) is not pointing to a port resource, or the resource is not in use.
- **ET\_ILLEGAL\_RESOURCE**: \( s \) is not a valid shift count for the transfer width of the port, or the port is not in BUFFERED mode.
SETPT

Set the port time

Specifies the time when the next port input or output will be performed. The time is specified in terms of the number of edges of the clock associated with this port. The port timer stores a 16-bit value hence the largest delay is 65535 edges of the port-clock.

The instruction has two operands:

\[
\begin{align*}
  op_1 & \quad r & \text{Operand register, one of } r_0 \ldots r_{11} \\
  op_2 & \quad s & \text{Operand register, one of } r_0 \ldots r_{11}
\end{align*}
\]

Mnemonic and operands:

\[
\text{SETPT} \quad r, s
\]

Operation:

\[
\text{porttimer}_r \leftarrow s
\]

Encoding:

\[
\text{r2r} \quad 0 \ 0 \ 0 \ 1 \ 1 \ | \ldots \ldots | 1 \ | \ldots \ldots \quad \text{R}
\]

Conditions that raise an exception:

- \text{ET\_RESOURCE\_DEP} Resource illegally shared between threads
- \text{ET\_ILLEGAL\_RESOURCE} \( r \) is not pointing to a port resource, or the resource is not in use.
**SETRDY**

Set ready input for a port

Sets ready input pin to be used by a port for strobing or handshaking.

If $r$ is a clock block, then $s$ should be the 1-bit port to be used as ready input. $r$ should be associated with a dataport using SETCLK.

Otherwise, if $r$ is a port, then this port should be in mode READY_OUT, and $s$ is the data port from which the ready out will be generated.

The instruction has two operands:

\[
\begin{align*}
\text{op1} & \quad r \\
\text{op2} & \quad s
\end{align*}
\]

Operand register, one of $r0...r11$

Mnemonic and operands:

\[
\text{SETRDY} \quad r, s
\]

Operation:

\[
d_{\text{ready}} r \leftarrow s
\]

Encoding:

\[
\begin{array}{l}
\begin{array}{cccccc}
1 & 1 & 1 & 1 & 1 & 1 \\
0 & 0 & 1 & 0 & 1 & 0
\end{array}
\end{array}
\]

M&R

Conditions that raise an exception:

- **ETRESOURCE_DEP** Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE** $r$ is not pointing to a port or clock resource, or the resource is not in use.
- **ET_ILLEGAL_RESOURCE** $s$ is not pointing to a port resource, or the port is not a 1-bit port.
SETSP

Set the stack pointer

Sets the end address of the stack, held in $sp$. The value that is written into $sp$ should be word-aligned, otherwise subsequent loads and stores relative to $sp$ will raise an exception.

SETSP is used in conjunction with ENTSP, RETSP, LDWSP and STWSP.

The instruction has one operand:

$$ op1 \quad s $$

Operand register, one of $r0...r11$

Mnemonic and operands:

<table>
<thead>
<tr>
<th></th>
<th>SETSP</th>
<th>$s$</th>
</tr>
</thead>
</table>

Operation:

$$ sp \leftarrow s $$

Encoding:

<table>
<thead>
<tr>
<th>$1r$</th>
<th>0 0 1 0 1 1 1 1 1 1 1 1 1</th>
<th>$M$</th>
</tr>
</thead>
</table>
SETSR

Set bits in the thread’s Status Register. The mask supplied specifies which bits should be set. Note that setting the EEBLE bit may cause an event to be issued, causing subsequent instructions to not be executed (since events do not save the program counter). Setting IEBLE may cause an interrupt to be issued. The bits are defined as follows:

0 EEBLE When 1 events are enabled for the thread.
1 IEBLE When 1 interrupts are enabled for the thread.
2 INENB 1 when in an event enabling sequence.
3 ININT 1 when in an interrupt handler.
4 INK 1 when in kernel mode.
6 WAITING When 1 the thread is paused waiting for events.
7 FAST When 1 the thread will continually issue.

SETSR can only be used to set the EEBLE, IEBLE and INENB bits.

CLRSR is used to clear bits in the status register.

The instruction has one operand:

\( op \ u_{16} \)

A 16-bit immediate in the range 0...65535. If \( u_{16} < 64 \), the instruction requires no prefix.

Mnemonic and operands:

\[ \text{SETSR} \quad u_{16} \]

Operation:

\[ \text{sr} \leftarrow \text{sr} \lor \text{bit} u_{16} \]

Encoding:

\[ u_{6} \quad 0 \ 1 \ 1 \ 1 \ 1 \ 0 \ 1 \ 1 \ 0 \ 1 \ 1 \ 0 \] \( \text{R} \)

or prefixed for long immediates:

\[ \text{lu}_{6} \quad 0 \ 1 \ 1 \ 1 \ 0 \ 1 \ 1 \ 0 \ 1 \ 1 \ 0 \] \( \text{M&R} \)
SETTW

Set transfer width for a port

Sets the number of bits that is transferred on an IN or OUT operation on a port that is buffered. The buffering will shift the data.

The instruction has two operands:

\[
\begin{align*}
\text{op1} & \quad r & \text{Operand register, one of } r0...r11 \\
\text{op2} & \quad s & \text{Operand register, one of } r0...r11
\end{align*}
\]

Mnemonic and operands:

\[
\text{SETTW} \quad r, s
\]

Operation:

\[
\text{transferwidth}_r \leftarrow s
\]

Encoding:

\[
\begin{array}{cccccccccccc}
1 & 1 & 1 & 1 & \cdots & \cdots & 1 & \cdots & \cdots \\
0 & 0 & 1 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 0 & 1 & 1 & 0
\end{array}
\]

Conditions that raise an exception:

- \text{ET_ILLEGALRESOURCE} \quad r \text{ is not pointing to a port resource, or the port is not in use.}
- \text{ET_RESOURCE_DEP} \quad \text{Resource illegally shared between threads}
- \text{ET_ILLEGALRESOURCE} \quad s \text{ is not legal width for the port, or the port is not in BUFFERS mode.}
**SETV**

Sets the vector related to a resource. When a resource issues an event to a thread, this vector is used to determine which instruction to issue. The vector is typically set up once when all event handlers are installed. Note that if an illegal vector is supplied, this will not raise an exception until an actual event is handled.

SETV is used in conjunction with **SETEV**, and any of the **WAITEU** instructions.

The instruction has one operand:

\[
\text{op} 1 \quad r \quad \text{Operand register, one of } r0...r11
\]

Mnemonic and operands:

\[
\text{SETV} \quad r
\]

Operation:

\[
v_r \leftarrow r11
\]

Encoding:

\[
1r \quad 0 1 0 0 1 1 1 1 1 1 1 1 1 1 \ldots \quad \text{R}
\]

Conditions that raise an exception:

- **ET_RESOURCE_DEP**: Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE**: \( r \) is not pointing to a port, timer or channel resource, or the resource is not in use.
SEXT

Sign extends an n-bit field stored in a register. The first operand is both a source and destination operand. The second operand contains the bit position. All bits at a position higher or equal are set to the value of the bit one position lower. In effect, the lower n bits are interpreted as a signed integer, and produced in the destination register.

The instruction has two operands:

\[ op1 \ d \quad \text{Operand register, one of } r0... r11 \]
\[ op2 \ s \quad \text{Operand register, one of } r0... r11 \]

Mnemonic and operands:

\[
\text{SEXT} \quad d, s
\]

Operation:

\[
d = \begin{cases} 
    s \leq 0 \vee s \geq \text{bpw}, & d \\
    s > 0 \land s < \text{bpw}, & d[s - 1] : \ldots : d[s - 1] : d[s - 1...0]
\end{cases}
\]

Encoding:

\[
2r \quad 0 \ 0 \ 1 \ 1 \ | \ \ldots \ldots \ | \ \ldots \ldots \ \ldots \ldots \ 0 \ \ldots \ldots \ M+R
\]
SEXTI

Sign extend an n-bit field immediate

Sign extends an n-bit field stored in a register. The first operand is both a source and destination operand. The second operand contains the bit position. All bits at a position higher or equal are set to the value of the bit one position lower. In effect, the lower n bits are interpreted as a signed integer, and produced in the destination register.

The instruction has two operands:

\[ \text{op1} \quad d \quad \text{Operand register, one of } r0...r11 \]
\[ \text{op2} \quad \text{bitp} \quad \text{A bit position; one of } bpw, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32 \]

Mnemonic and operands:

\[ \text{SEXTI} \quad d, \text{bitp} \]

Operation:

\[ d \leftarrow \begin{cases} 
\text{bitp} \leq 0 \lor \text{bitp} \geq \text{bpw}, & d \\
\text{bitp} > 0 \land \text{bitp} < \text{bpw}, & d[\text{bitp} - 1] : \ldots : d[\text{bitp} - 1] : d[\text{bitp} - 1:\ldots:0] 
\end{cases} \]

Encoding:

\[ \text{rus} \quad 0 \quad 0 \quad 1 \quad 1 \quad 0 \quad \ldots \quad \ldots \quad 1 \quad \ldots \quad \quad \text{M+R} \]
**SHL**

Shift left

Shifts a word left by \( y \) bits, filling the least significant \( y \) bits with zeros. Shift left multiplies signed and unsigned integers by \( 2^y \).

The instruction has three operands:

\[
op_1 \quad d \quad \text{Operand register, one of } r0...r11 \\
op_2 \quad x \quad \text{Operand register, one of } r0...r11 \\
op_3 \quad y \quad \text{Operand register, one of } r0...r11
\]

Mnemonic and operands:

\[
\text{SHL} \quad d, x, y
\]

Operation:

\[
d \leftarrow \begin{cases} 
y < bpw, & x[bpw - y...0] : 0 : ... : 0 \\
y \geq bpw, & 0 \end{cases}
\]

Encoding:

\[
3r \quad 0 \ 0 \ 1 \ 0 \ 0 \ |
\ldots \ldots \ldots \ldots 
\]

\text{M+R}
**SHLI**

Shift left immediate

Shifts a word left by \( bitp \) bits, filling the least significant \( bitp \) bits with zeros. Shift left multiplies signed and unsigned integers by \( 2^{bitp} \).

The instruction has three operands:

- \( op1 \) \( d \)  Operand register, one of \( r0...r11 \)
- \( op2 \) \( x \)  Operand register, one of \( r0...r11 \)
- \( op3 \) \( bitp \)  A bit position; one of \( bpw, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32 \)

Mnemonic and operands:

\[
\text{SHLI} \quad d, x, bitp
\]

Operation:

\[
d \leftarrow \begin{cases} 
bitp < bpw, & x[bpw - bitp...0]:0 : \ldots : 0 \\
bitp \geq bpw, & 0 
\end{cases}
\]

Encoding:

\[
2\text{rus} \quad \begin{array}{cccccccccccccccc}
1 & 0 & 1 & 0 & 0 & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\
\end{array} \quad \text{M+R}
\]
**SHR**

**Shift right**

Shifts a word right by \( y \) positions, filling the most significant \( y \) bits with zeros. This implements an unsigned divide by \( 2^y \).

For signed shifts, use **ASHR**.

The instruction has three operands:

\[
\begin{align*}
op1 & \quad d & \text{Operand register, one of } r0...r11 \\
op2 & \quad x & \text{Operand register, one of } r0...r11 \\
op3 & \quad y & \text{Operand register, one of } r0...r11 \\
\end{align*}
\]

Mnemonic and operands:

\[\text{SHR} \quad d, x, y\]

Operation:

\[
d \leftarrow \begin{cases} 
  y < bpw, & 0:...:0:x[bpw-1...y] \\
  y \geq bpw, & 0 
\end{cases}
\]

Encoding:

\[
3r \quad 00101\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\cdot\quad M+R
\]
SHRI  

Shifts a word right by \(bitp\) positions, filling the most significant \(bitp\) bits with zeros. This implements an unsigned divide by \(2^{bitp}\).

For signed shifts, use ASHR.

The instruction has three operands:

\[
\begin{align*}
\text{op1} & \quad \text{d} & \text{Operand register, one of } r0, \ldots, r11 \\
\text{op2} & \quad \text{x} & \text{Operand register, one of } r0, \ldots, r11 \\
\text{op3} & \quad \text{bitp} & \text{A bit position; one of } bpw, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32
\end{align*}
\]

Mnemonic and operands:

\[
\text{SHRI} \quad d, x, \text{bitp}
\]

Operation:

\[
d \leftarrow \begin{cases} 
\text{bitp} < bpw, & 0 : \ldots : 0 : x[bpw - 1 \ldots \text{bitp}] \\
\text{bitp} \geq bpw, & 0
\end{cases}
\]

Encoding:

\[
2\text{rus} \quad [1 \text{ 0 1 0 1} \cdots \cdots \cdots \cdots \cdots ] \quad \text{M+R}
\]
SSYNC

Slave synchronise

Synchronises this thread with all threads associated with a synchroniser. SSYNC is used together with MSYNC to implement a barrier, or together with MJOIN in order to terminate a group of processes. SSYNC uses the synchroniser that was used to create this process in order to establish which other processes to synchronise with.

SSYNC clears the EEBLE bit, disabling any events from being issued; this commits the thread to synchronising. If the ININT bit is set, then SSYNC will not block; SSYNC should not be used inside an interrupt handler.

The instruction has no operands.

Mnemonic and operands:

\[
\text{SSYNC}
\]

Operation:

\[
sr[\text{eeble}] \leftarrow 0 \\
\text{if } (\text{slaves}_{\text{syn}(\text{tid})} \setminus \text{spaus} = \{\text{tid}\}) \land \text{msyn}_{\text{syn}(\text{tid})} \text{ then} \\
\quad \text{if } \text{mjoin}_{\text{syn}(\text{tid})} \text{ then} \\
\quad \quad \text{forall thread} \in \text{slaves}_{\text{syn}(\text{tid})}: \text{inuse}_{\text{thread}} \leftarrow 0 \\
\quad \quad \text{mjoin}_{\text{syn}(\text{tid})} \leftarrow 0 \\
\quad \text{else} \\
\quad \quad \text{spaus} \leftarrow \text{spaus} \setminus \text{slaves}_{\text{syn}(\text{tid})} \\
\quad \quad \text{mpaus} \leftarrow \text{mpaus} \setminus \{\text{mstr}_{\text{syn}(\text{tid})}\} \\
\quad \quad \text{msyn}_{\text{syn}(\text{tid})} \leftarrow 0 \\
\text{else} \\
\quad \text{spaus} \leftarrow \text{spaus} \cup \{\text{tid}\}
\]

Encoding:

\[
0 0 0 0 0 | 1 1 1 1 | 0 | 1 1 0 \quad R
\]
ST8

8-bit store

Stores eight bits of a register into memory. The least significant 8 bits of the register are stored into the address computed using a base address \((b)\) and index \((i)\).

The instruction has three operands:

\[
\begin{align*}
op1 & \quad s & \text{Operand register, one of } r0 \ldots r11 \\
op2 & \quad b & \text{Operand register, one of } r0 \ldots r11 \\
op3 & \quad i & \text{Operand register, one of } r0 \ldots r11 
\end{align*}
\]

Mnemonic and operands:

\[
\text{ST8} \quad s, b, i
\]

Operation:

\[
\text{mem}[ea - \text{bytenum}][\text{bitnum + 7}...\text{bitnum}] \leftarrow s \\
\text{where } ea \leftarrow b + i \\
\text{bytenum} \leftarrow ea \mod Bpw \\
\text{bitnum} \leftarrow 8 \times \text{bytenum}
\]

Encoding:

\[
\begin{array}{cccccccccccc}
1 & 1 & 1 & 1 & \ldots & \ldots & \ldots & \ldots & \ldots & \ldots & \ldots & 1 \\
I3r & 1 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 0 & 1 & 0 & 0 \\
M&R
\end{array}
\]

Conditions that raise an exception:

\text{ET_LOAD_STORE} \quad \text{The indexed address does not point to a valid memory location.}
ST16  

16-bit store

Stores 16 bits of a register into memory. The least significant 16 bits of the register are stored into the address computed using a base address ($b$) and index ($i$). The base address should be word-aligned, the index is multiplied by 2.

The instruction has three operands:

- $op_1$  $s$  Operand register, one of $r0...r11$
- $op_2$  $b$  Operand register, one of $r0...r11$
- $op_3$  $i$  Operand register, one of $r0...r11$

Mnemonic and operands:

```
ST16  s, b, i
```

Operation:

```
mem[ea – bytenum][bitnum + 15...bitnum] ← s[15...0]
where ea ← b + i × 2
bytenum ← ea mod Bpw
bitnum ← 16 × (bytenum ÷ 2)
```

Encoding:

```
1 | l3r | 1 0 0 0 0 | 1 1 1 1 1 1 0 0 | M&R
```

Conditions that raise an exception:

- $ET\_LOAD\_STORE$  $b$ is not 16-bit aligned (unaligned load), or does not point to a valid memory location.
STD

Store double word

Stores two words in memory, at a location specified by a base address and an index. The index is multiplied by the size of a double word, the base address must be double-word aligned.

The immediate version, STDi, implements a store into a structured data type, the version with registers only, STD, implements a store into an array.

The instruction has four operands:

\[ \begin{align*}
    op1 & \quad d & \text{Operand register, one of } r0...r11 \\
    op4 & \quad e & \text{Operand register, one of } r0...r11 \\
    op2 & \quad b & \text{Operand register, one of } r0...r11 \\
    op3 & \quad i & \text{Operand register, one of } r0...r11 \\
\end{align*} \]

Mnemonic and operands:

\[
\text{STD} \quad d, e, b, i
\]

Operation:

\[
\begin{align*}
    \text{mem}[b + i \times Bpw \times 2] & \quad \leftarrow \quad d \\
    \text{mem}[b + i \times Bpw \times 2 + Bpw] & \quad \leftarrow \quad e
\end{align*}
\]

Encoding:

\[
\begin{align*}
    \text{I4r} & \quad [1 1 1 1] \ldots [0 0 0 1] [1 1 1 1] [0] \ldots [M & \& R]  \\
\end{align*}
\]

Conditions that raise an exception:

ET_LOAD_STORE  \( b \) is not double word aligned, or the indexed address does not point to a valid memory location.
STDI

Store double word immediate

Stores two words in memory, at a location specified by a base address and an index. The index is multiplied by the size of a double word, the base address must be double-word aligned.

The immediate version, STDI, implements a store into a structured data type, the version with registers only, STD, implements a store into an array.

The instruction has four operands:

- \( op1 \) \( d \) Operand register, one of \( r0 \ldots r11 \)
- \( op4 \) \( e \) Operand register, one of \( r0 \ldots r11 \)
- \( op2 \) \( b \) Operand register, one of \( r0 \ldots r11 \)
- \( op3 \) \( i \) An integer in the range \( 0 \ldots 11 \)

Mnemonic and operands:

- \( \text{STDI} \) \( d, e, b, i \)

Operation:

- \( \text{mem}[b + i \times Bpw \times 2] \leftarrow d \)
- \( \text{mem}[b + i \times Bpw \times 2 + Bpw] \leftarrow e \)

Encoding:

- \( \text{l3rus} \)

Conditions that raise an exception:

- \( \text{ET_LOAD_STORE} \) \( b \) is not double word aligned, or the indexed address does not point to a valid memory location.
STDSP

Store double word on stack

Stores two words on the stack, using a constant offset from the stack pointer. The offset is specified in double words.

The instruction has three operands:

- **op1** \(d\)  Operand register, one of \(r0...r11\)
- **op2** \(e\)  Operand register, one of \(r0...r11\)
- **op3** \(u\) An integer in the range 0...11

Mnemonic and operands:

\[
\text{STDSP} \quad d, e, u
\]

Operation:

\[
\begin{align*}
\text{mem}[sp + u \times Bpw \times 2] & \leftarrow d \\
\text{mem}[sp + u \times Bpw \times 2 + Bpw] & \leftarrow e
\end{align*}
\]

Encoding:

| 1 1 1 1 | · · · · · · · · · · |
| 1 1 1 0 | 1 1 1 1 | 0 1 1 0 |

Conditions that raise an exception:

**ET_LOAD_STORE** \(sp\) is not double-word aligned, or the indexed address does not point to a valid memory location.
STET

Store ET on the stack

Stores the value of ET on the stack at offset 4.

The value can be restored using LDET. Together with STSPC, STSSR, and STSED all or part of the state copied during an interrupt can be placed on the stack.

The instruction has no operands.

Mnemonic and operands:

```
STET
```

Operation:

```
mem[sp + 4 \times Bpw] \leftarrow set
```

Encoding:

```
0 0 0 0 1 | 1 1 1 1 | 1 | 0 | 1
```

Conditions that raise an exception:

```
ET_LOAD_STORE  The indexed address does not point to a valid memory location.
```
STSED

Store SED on the stack

Stores the value of SED on the stack at offset 3.

The value can be restored using LDSED. Together with STSPC, STSSR, and STET all or part of the state copied during an interrupt can be placed on the stack.

The instruction has no operands.

Mnemonic and operands:

```
STSED
```

Operation:

```
mem[sp + 3 × Bpw] ← sed
```

Encoding:

```
0 0 0 0 1 1 1 1 | 1 1 1 1 | 1 1 0 0 M
```

Conditions that raise an exception:

```
ET_LOAD_STORE  The indexed address does not point to a valid memory location.
```
STSPC

Store SPC on the stack

Stores the value of SPC on the stack at offset 1.

The value can be restored using LDSPC. Together with STET, STSSR, and STSED all or part of the state copied during an interrupt can be placed on the stack.

The instruction has no operands.

Mnemonic and operands:

\[
\text{STSPC}
\]

Operation:

\[
\text{mem}[sp + 1 \times Bpw] \leftarrow \text{spc}
\]

Encoding:

\[
\begin{array}{cccccccc|c|cccccc}
0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 0 & 1 & 0 & 1 & M
\end{array}
\]

Conditions that raise an exception:

ET_LOAD_STORE  The indexed address does not point to a valid memory location.
**STSSR**

Store the SSR to the stack

Stores the value of SSR on the stack at offset 2.

The value can be restored using **LDSSR**. Together with **STET**, **STSPC**, and **STSED** all or part of the state copied during an interrupt can be placed on the stack.

The instruction has no operands.

Mnemonic and operands:

```
STSSR
```

Operation:

\[
mem[sp + 2 \times Bpw] \leftarrow ssr
\]

Encoding:

```
0 0 0 0 0 1 1 1 1 1 1 0 1 1 1 M
```

Conditions that raise an exception:

**ET_LOAD_STORE** The indexed address does not point to a valid memory location.
STW

Stores a word in memory, at a location specified by a base address and an index. The index is multiplied by the size of a word, the base address must be word aligned.

The immediate version, STWI, implements a store into a structured data type, the version with registers only, STW, implements a store into an array.

The instruction has three operands:

- \( op_1 \) \( s \)  Operand register, one of \( r0...r11 \)
- \( op_2 \) \( b \)  Operand register, one of \( r0...r11 \)
- \( op_3 \) \( i \)  Operand register, one of \( r0...r11 \)

Mnemonic and operands:

\[
\text{STW} \quad s, b, i
\]

Operation:

\[
\text{mem}[b + i \times Bpw] \leftarrow s
\]

Encoding:

\[
\begin{array}{cccccccc}
1 & 1 & 1 & 1 & 1 & 1 & 1 & 0 \\
0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 0 & 1 & 1 & 0 & 0
\end{array}
\]

Conditions that raise an exception:

\[ \text{ET_LOAD_STORE} \] \( b \) is not word aligned, or the indexed address does not point to a valid memory location.
STWI

Store word immediate

Stores a word in memory, at a location specified by a base address and an index. The index is multiplied by the size of a word, the base address must be word aligned.

The immediate version, STWI, implements a store into a structured data type, the version with registers only, STW, implements a store into an array.

The instruction has three operands:

\[ op1 \quad s \quad \text{Operand register, one of } r0...r11 \]
\[ op2 \quad b \quad \text{Operand register, one of } r0...r11 \]
\[ op3 \quad i \quad \text{An integer in the range } 0...11 \]

Mnemonic and operands:

\[ \text{STWI} \quad s, b, i \]

Operation:

\[ \text{mem}[b + i \times Bpw] \leftarrow s \]

Encoding:

\[ \text{2rus} \quad 0 \quad 0 \quad 0 \quad 0 \quad 0 \quad 0 \quad \text{M} \]

Conditions that raise an exception:

\[ \text{ET_LOAD_STORE} \quad b \text{ is not word aligned, or the indexed address does not point to a valid memory location.} \]
STWDP

Store word in data pool

Stores a word in the data area, using a constant offset from the data pointer. The offset is specified in words. STWDP can be used to write to global variables.

The instruction has two operands:

\[ op1 \quad S \quad \text{Any of } r0...r11, \ cp, \ dp, \ sp, \ lr \]
\[ op2 \quad u_{16} \quad \text{A 16-bit immediate in the range } 0...65535. \text{ If } u_{16} < 64, \text{ the instruction requires no prefix} \]

Mnemonic and operands:

\[ \text{STWDP} \quad S, u_{16} \]

Operation:

\[ \text{mem}[dp + u_{16} \times Bpw] \leftarrow S \]

Encoding:

\[ \text{ru6} \quad 0 \quad 1 \quad 0 \quad 1 \quad 0 | \quad 0 | \quad \text{. . . . . . . . . . .} \quad M \]

or prefixed for long immediates:

\[ \text{lru6} \quad 1 \quad 1 \quad 1 \quad 1 \quad 0 | \quad 0 | \quad \text{. . . . . . . . . . .} \quad \text{M&R} \]

Conditions that raise an exception:

\[ \text{ET_LOAD_STORE} \quad \text{dp} \text{is not word aligned, or the indexed address does not point to a valid memory location.} \]
STWSP

Stores a word on the stack, using a constant offset from the stack pointer. The offset is specified in words. STWSP is used to write to stack variables.

The instruction has two operands:

- \( op_1 \) \( S \) Any of \( r0 \ldots r11, cp, dp, sp, lr \)
- \( op_2 \) \( u_{16} \) A 16-bit immediate in the range 0...65535. If \( u_{16} < 64 \), the instruction requires no prefix

Mnemonic and operands:

\[
\text{STWSP} \quad S, u_{16}
\]

Operation:

\[
\text{mem}[sp + u_{16} \times Bpw] \leftarrow S
\]

Encoding:

\[
\begin{array}{c|cccccccccccccc}
\text{ru6} & 0 & 1 & 0 & 1 & 0 & 1 & 1 & . & . & . & . & . & . & . \\
\text{M} & & & & & & & & & & & & & & \\
\end{array}
\]

or prefixed for long immediates:

\[
\begin{array}{c|cccccccccccccc}
\text{lru6} & 1 & 1 & 1 & 1 & 0 & 0 & . & . & . & . & . & . & . & . \\
\text{M&R} & & & & & & & & & & & & & & \\
\end{array}
\]

Conditions that raise an exception:

\text{ET\_LOAD\_STORE} \quad sp \text{ is not word aligned, or the indexed address does not point to a valid memory location.}
SUB

Integer unsigned subtraction

Computes the difference between two words. No check on overflow is performed, and the result is produced modulo $2^{bpw}$.

If a borrow is required, then the LSUB instruction should be used. LSU and LSS should be used to compare signed and unsigned integers.

The instruction has three operands:

- $op_1$  $d$  Operand register, one of $r0$...$r11$
- $op_2$  $x$  Operand register, one of $r0$...$r11$
- $op_3$  $y$  Operand register, one of $r0$...$r11$

Mnemonic and operands:

```
SUB $d, x, y$
```

Operation:

$$d \leftarrow (2^{bpw} + x - y) \mod 2^{bpw}$$

Encoding:

```
3r 0 0 0 1 1 · · · · · · · · · · · · M+R
```
**SUBI**

Integer unsigned subtraction immediate

Computes the difference between two words. No check on overflow is performed, and the result is produced modulo $2^{bpw}$.

If a borrow is required, then the **LSUB** instruction should be used. **LSU** and **LSS** should be used to compare signed and unsigned integers.

The instruction has three operands:

- $op1\ d$  
  Operand register, one of $r0...r11$
- $op2\ x$  
  Operand register, one of $r0...r11$
- $op3\ u_s$  
  An integer in the range $0...11$

Mnemonic and operands:

```
SUBI  d, x, u_s
```

Operation:

$$d \leftarrow (2^{bpw} + x - u_s) \mod 2^{bpw}$$

Encoding:

```
2rus  1 0 0 1 1 | ... | ... | ... | ... | ... | ... | M+R
```
SYNCR

Synchronise a resource

Synchronise with a port to ensure all data has been output. This instruction completes once all data has been shifted out of the port, and the last port width of data has been held for one clock period.

The instruction has one operand:

\[ \text{SYNCR} \quad r \quad \text{Operand register, one of } r0...r11 \]

Mnemonic and operands:

\[ \text{SYNCR} \quad r \]

Operation:

\[ \text{syncr}(r) \]

Encoding:

\[ \begin{array}{l}
1r \quad 1000011111111111... \\
\end{array} \quad R \]

Conditions that raise an exception:

\begin{itemize}
  \item \text{ET\_RESOURCE\_DEP} \quad \text{Resource illegally shared between threads}
  \item \text{ET\_ILLEGAL\_RESOURCE} \quad r \text{ is not a port resource, or the resource is not in use.}
\end{itemize}
TESTCT

Test for control token

Test whether the next token on a channel \( (r) \) is a control token. If the channel contains a control token, then 1 (true) will be produced in the destination register, otherwise 0 (false) will be produced.

This instruction pauses if the channel does not have a token available to be read.

In contrast to CHKCT this test does not trap, and does not discard the control token. TESTCT can be used to implement complex protocols over channels.

The instruction has two operands:

\[
\begin{align*}
\text{op1} & \quad d \quad \text{Operand register, one of } r0...r11 \\
\text{op2} & \quad r \quad \text{Operand register, one of } r0...r11
\end{align*}
\]

Mnemonic and operands:

\[
\text{TESTCT} \quad d, r
\]

Operation:

\[
d \leftarrow \begin{cases} 
\text{hasctoken}(r), & 1 \\
\neg\text{hasctoken}(r), & 0 
\end{cases}
\]

Encoding:

\[
\begin{array}{cccccccccccc}
2r & 1 & 0 & 1 & 1 & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & R
\end{array}
\]

Conditions that raise an exception:

- \text{ET\_RESOURCE\_DEP} \quad \text{Resource illegally shared between threads}
- \text{ET\_ILLEGAL\_RESOURCE} \quad r \text{ is not pointing to a channel resource, or the resource is not in use.}
**TESTLCL**  

Tests if a channel end is connected to a local channel end or to a remote channel end. It produces 1 (true) in the destination register if the channel end is local, and 0 (false) if the channel end is remote. The instruction will raise an exception if the resource supplied is not a channel end or an unconnected channel end.

The instruction has two operands:

\[
\begin{align*}
\text{op1} & \quad d & \quad \text{Operand register, one of } r0 \ldots r11 \\
\text{op2} & \quad r & \quad \text{Operand register, one of } r0 \ldots r11
\end{align*}
\]

Mnemonic and operands:

\[
\text{TESTLCL} \quad d, r
\]

Operation:

\[
d \leftarrow \begin{cases} 
  d_r[bpw - 1..16] = r[bpw - 1..16], & 1 \\
  d_r[bpw - 1..16] \neq r[bpw - 1..16], & 0 
\end{cases}
\]

Encoding:

\[
\begin{array}{cccccc}
1 & 1 & 1 & 1 & \cdot & \cdot & \cdot & \cdot & 0 & \cdot & \cdot & \cdot \\
0 & 0 & 1 & 0 & 0 & 1 & 1 & 1 & 1 & 0 & 1 & 0 & 0
\end{array}
\]

M&R

Conditions that raise an exception:

- **ET_RESOURCE_DEP**  
  Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE**  
  \( r \) is not pointing to a channel resource, or the resource is not in use.
- **ET_ILLEGAL_RESOURCE**  
  \( r \) is a channel end, and the destination has not been set.
TESTWCT

Test for position of control token

Test whether the next word contains a control token, and produces the position (1-4) of the first control token in the word, or 0 if it contains no control tokens.

This instruction pauses if the channel has not received enough tokens to determine what value to return. So if less than four tokens have been received, but one of them is a control token, the instruction will not pause.

The instruction has two operands:

\[ \text{op1 } d \text{ Operand register, one of } r0...r11 \]
\[ \text{op2 } r \text{ Operand register, one of } r0...r11 \]

Mnemonic and operands:

\[ \text{TESTWCT } d, r \]

Operation:

\[ d \leftarrow \begin{cases} 
\neg \text{hasctoken}(r), & 0 \\
\text{firsttokenisctoken}, & 1 \\
\text{secondtokenisctoken}, & 2 \\
\text{thirdtokenisctoken}, & 3 \\
\text{fourthtokenisctoken}, & 4 
\end{cases} \]

Encoding:

\[ 2r \quad 11000\ldots\ldots1\ldots\ldots \quad \text{R} \]

Conditions that raise an exception:

\[ \text{ET\_RESOURCE\_DEP} \quad \text{Resource illegally shared between threads} \]
\[ \text{ET\_ILLEGAL\_RESOURCE} \quad r \text{ is not pointing to a channel resource, or the resource is not in use.} \]
TINITCP

Initialise a thread's CP

Sets the constant pool pointer for a specific thread. This operation may be used after a thread has been allocated (using GETST or GETR), but prior to the thread starting its execution.

The instruction has two operands:

\[
\begin{align*}
  op_1 & \quad s & \text{Operand register, one of } r0...r11 \\
  op_2 & \quad t & \text{Operand register, one of } r0...r11
\end{align*}
\]

Mnemonic and operands:

\[
\text{TINITCP} \quad s, t
\]

Operation:

\[
\begin{align*}
  cp_t & \leftarrow s
\end{align*}
\]

Encoding:

\[
\begin{array}{c}
\text{I2r} \\
\begin{array}{cccccccc}
1 & 1 & 1 & 1 & . & . & . & . & 0 & . & . & . & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 0 & 1 & 0 & 1
\end{array}
\end{array}
\]

M&R

Conditions that raise an exception:

- **ET_RESOURCE_DEP**: Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE**: \( t \) is not pointing to a thread resource, or the thread is not in use, or the thread is not SSYNC.
TINITDP  
Initialise a thread's DP

Sets the data pointer for a specific thread. This operation may be used after a thread has been allocated (using GETST or GETR), but prior to the thread starting its execution.

The instruction has two operands:

\[
\begin{align*}
    op1 & \quad s & \text{Operand register, one of } r0...r11 \\
    op2 & \quad t & \text{Operand register, one of } r0...r11 \\
\end{align*}
\]

Mnemonic and operands:

\[
\text{TINITDP } s, t
\]

Operation:

\[
dp_t \leftarrow s
\]

Encoding:

```
|1 1 1 1| . . . | 0 | . | . | \\
|0 0 0 0|1 1 1 1|1 0|1 0 0| M&R
```

Conditions that raise an exception:

- **ET_RESOURCE_DEP**: Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE**: \( t \) is not pointing to a thread resource, or the thread is not in use, or the thread is not SSYNC.
TINITLR

Initialise a thread's LR

Sets the link register for a specific thread. This operation may be used after a thread has been allocated (using GETST or GETR), but prior to the thread starting its execution.

The instruction has two operands:

\[
\begin{align*}
\text{op}_1 & \quad s \quad \text{Operand register, one of } r0...r11 \\
\text{op}_2 & \quad t \quad \text{Operand register, one of } r0...r11
\end{align*}
\]

Mnemonic and operands:

\[
\text{TINITLR} \quad s, t
\]

Operation:

\[
\text{lr}_t \leftarrow s
\]

Encoding:

\[
\begin{array}{cccccccc}
1 & 1 & 1 & 1 & \cdot & \cdot & \cdot & 0 & \cdot & \cdot & \cdot \\
0 & 0 & 0 & 1 & 0 & 1 & 1 & 1 & 1 & 0 & 1 & 1 & 0
\end{array}
\]

M&R

Conditions that raise an exception:

- **ETRESOURCE_DEP**: Resource illegally shared between threads
- **ET_ILLEGAL_RESOURCE**: \( t \) is not pointing to a thread resource, or the thread is not in use, or the thread is not SSYNC.
TINITPC

Sets the program counter for a specific thread. This operation may be used after a thread has been allocated (using GETST or GETR), but prior to the thread starting its execution.

The instruction has two operands:

\[
\begin{align*}
\text{op1} & \quad s \quad \text{Operand register, one of } r0...r11 \\
\text{op2} & \quad t \quad \text{Operand register, one of } r0...r11
\end{align*}
\]

Mnemonic and operands:

\[
\text{TINITPC} \quad s, t
\]

Operation:

\[
pc_t \leftarrow s
\]

Encoding:

\[
\begin{array}{cccccccccccc}
1 & 1 & 1 & 1 & \cdots & \cdots & 1 & \cdots & 1 & \cdots & 1 & 0 & 1 & 0 & 0 \\
\hline
l2r & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 0 & 1 & 1 & 0 & 0
\end{array}
\]

M&R

Conditions that raise an exception:

- **ET_RESOURCE_DEP**  
  Resource illegally shared between threads

- **ET_ILLEGAL_RESOURCE**  
  t is not pointing to a thread resource, or the thread is not in use, or the thread is not SSYNC.
TINITSP

Initialise a thread's SP

Sets the stack pointer for a specific thread. This operation may be used after a thread has been allocated (using GETST or GETR), but prior to the thread starting its execution.

The instruction has two operands:

\[ op_1 \quad s \quad \text{Operand register, one of r0...r11} \]
\[ op_2 \quad t \quad \text{Operand register, one of r0...r11} \]

Mnemonic and operands:

\[ \text{TINITSP} \quad s, t \]

Operation:

\[ sp_{op} \leftarrow s \]

Encoding:

\[
\begin{array}{c|c|c|c|c|c|c|c|c|c|c}
1 & 1 & 1 & 1 & 1 & \cdots & 0 & \cdots & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 0 & 1 & 0 \\
\end{array}
\]

M&R

Conditions that raise an exception:

- \text{ET\_RESOURCE\_DEP} \quad \text{Resource illegally shared between threads}
- \text{ET\_ILLEGAL\_RESOURCE} \quad t \text{ is not pointing to a thread resource, or the thread is not in use, or the thread is not SSYNC.}
TSETMR

TSETMR

Set the master’s register

Writes data to a register of the master thread. This instruction should be used with care, and only when the other thread is known to be not using that register. Typically used to transfer results from a slave thread back to the master prior to a MJOIN.

TSETMR uses the synchroniser that was used to create this process in order to establish which thread’s register to write to.

The instruction has two operands:

\[
\begin{align*}
\text{op1} & \quad d & \text{Operand register, one of } r0... r11 \\
\text{op2} & \quad s & \text{Operand register, one of } r0... r11
\end{align*}
\]

Mnemonic and operands:

\[
\text{TSETMR} \quad d, s
\]

Operation:

\[
\text{mtid}_d \leftarrow s
\]

Encoding:

\[
\begin{array}{cccccccc}
1 & 1 & 1 & 1 & 1 & 1 & \cdot & \cdot & \cdot & 0 & \cdot & \cdot & \cdot \\
0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 0 & 1 & 1 & 0
\end{array}
\]

Conditions that raise an exception:

- \text{ET\_RESOURCE\_DEP} Resource illegally shared between threads
- \text{ET\_ILLEGAL\_RESOURCE} Master thread is not in use.
TSETR

Set register in thread

Writes data to a register of another thread. This instruction should be used with care, and only when the other thread is known to be not using that register.

The instruction has three operands:

- \( op1 \): Operand register, one of \( r0 \) ... \( r11 \)
- \( op2 \): Operand register, one of \( r0 \) ... \( r11 \)
- \( op3 \): Operand register, one of \( r0 \) ... \( r11 \)

Mnemonic and operands:

\[
\text{TSETR} \quad d, s, t
\]

Operation:

\[
d_i \leftarrow s
\]

Encoding:

\[
\begin{array}{cccccccccccccc}
1 & 1 & 1 & 1 & . & . & . & . & . & . & . & . & . & M&R \\
1 & 0 & 1 & 0 & 1 & 1 & 1 & 1 & 1 & 0 & 1 & 1 & 0 &
\end{array}
\]

Conditions that raise an exception:

- \( ET\_RESOURCE\_DEP \): Resource illegally shared between threads
- \( ET\_ILLEGAL\_RESOURCE \): \( t \) is not pointing to a thread resource, or the thread is not in use.
TSTART

Start thread

Starts an unsynchronised thread. An unsynchronised thread runs independently from the starting thread.

The unsynchronised thread must have been allocated with GETR, and the program counter should have been initialised with TINITPC.

The instruction has one operand:

\[ \text{op} 1 \ t \]  
Operand register, one of \( r0 \ldots r11 \)

Mnemonic and operands:

\[ \text{TSTART} \quad t \]

Operation:

\[ \begin{align*}
\text{spausaed} & \leftarrow \text{spausaed} \setminus \{t\} \\
\text{waiting}_t & \leftarrow 0
\end{align*} \]

Encoding:

\[ \begin{array}{c}
1r \quad 0 \ 0 \ 0 \ 1 \ 1 \ 1 \ 1 \ 1 \ 1 \ 1 \ 0 \ \cdots \ R
\end{array} \]

Conditions that raise an exception:

\begin{itemize}
  \item \text{ET\_RESOURCE\_DEP} \quad \text{Resource illegally shared between threads}
  \item \text{ET\_ILLEGAL\_RESOURCE} \quad t \text{ is not pointing to a thread, or the thread is not in use, or the thread is not SSYNC.}
  \item \text{ET\_ILLEGAL\_PC} \quad \text{Thread } t \text{ does not have a legal program counter.}
\end{itemize}
UNZIP

Unzips a pair of registers in bit, bit-pairs, nibbles, bytes or byte-pairs. The granularity
of zipping is determined by \(2^s\). The pair of registers is split in chunks of \(2^s\) bits.
The most significant chunk and every other chunk after that are concatenated and
written back to \(d\). The other chunks in between are written back to \(e\).

The instruction has three operands:

\[
\begin{align*}
op1 & \quad d & \quad \text{Operand register, one of } r0...r11 \\
op2 & \quad e & \quad \text{Operand register, one of } r0...r11 \\
op3 & \quad s & \quad \text{An integer in the range 0...11}
\end{align*}
\]

Mnemonic and operands:

\[
\text{UNZIP} \quad d, e, s
\]

Operation:

\[
\begin{align*}
w & \gets 2^s \\
z & \gets d : e \\
d & \gets z[2 \times bpw - 1..2 \times bpw - w - 1] : \\
    & \quad z[2 \times bpw - 2w - 1..2 \times bpw - 3w - 1] : \\
    & \quad ... \\
    & \quad z[2w - 1..w] \\
e & \gets z[2 \times bpw - w - 1..2 \times bpw - 2w - 1] : \\
    & \quad z[2 \times bpw - 3w - 1..2 \times bpw - 4w - 1] : \\
    & \quad ... \\
    & \quad z[w - 1..0]
\end{align*}
\]

Encoding:

\[
\begin{array}{cccccccc}
1 & 1 & 1 & 1 & 1 & \cdots & \cdots & \cdots \\
\end{array}
\]

\[
\begin{array}{cccccccc}
l2rus \quad & 1 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 0 & 1 & 0 & \text{M&R}
\end{array}
\]
WAITEF

If false wait for event

Waits for an event when a condition is false. If the condition is 0 (false), then the EEBLE is set, and, if no event is ready it will suspend the thread until an event becomes ready. When an event is available, the thread will continue at the address specified by the event. If the condition is not 0, the next instruction will be executed. The current PC is not saved anywhere.

The instruction has one operand:

\[ op1 \ c \]

Operand register, one of r0... r11

Mnemonic and operands:

\[ WAITEF \ c \]

Operation:

\[ \text{if } c = 0 \text{ then } s_{rtid}[eeble] \leftarrow 1 \]

Encoding:

1r 0 0 0 0 1 1 1 1 1 1 1 1 1 \ldots \ldots R
WAITET  

If true wait for event

WAITET waits for an event when a condition is true. If the condition not 0, then the EEBLE is set, and, if no event is ready it will suspend the thread until an event becomes ready. When an event is available, the thread will continue at the address specified by the event. If the condition is 0 (false), the next instruction will be executed. The current PC is not saved anywhere.

The instruction has one operand:

\[ \text{op1 } c \quad \text{Operand register, one of } r0... r11 \]

Mnemonic and operands:

\[ \text{WAITET } c \]

Operation:

\[ \text{if } c \neq 0 \text{ then } \text{sr}_{tid}[\text{eeble}] \leftarrow 1 \]

Encoding:

\[ 1r \quad 000011111110 \ldots \quad R \]
WAITEU

Wait for event

Waits for an event. This instruction sets EEBLE and, if no event is ready it will suspend the thread until an event becomes ready. When an event is available, the thread will continue at the address specified by the event. The current PC is not saved anywhere.

The instruction has no operands.

Mnemonic and operands:

WAITEU

Operation:

\[ s_{r_{tid[eeble]}{\text{←}}1} \]

Encoding:

<table>
<thead>
<tr>
<th>Or</th>
<th>0 0 0 0</th>
<th>1 1 1</th>
<th>1 0</th>
<th>0 0</th>
</tr>
</thead>
<tbody>
<tr>
<td>R</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
XOR

Bitwise exclusive or

Produces the bitwise exclusive-or of two words.

The instruction has three operands:

\[
\begin{align*}
\text{op1} & \quad \text{d} & \text{Operand register, one of r0... r11} \\
\text{op2} & \quad \text{x} & \text{Operand register, one of r0... r11} \\
\text{op3} & \quad \text{y} & \text{Operand register, one of r0... r11}
\end{align*}
\]

Mnemonic and operands:

\[
\text{XOR \quad d, x, y}
\]

Operation:

\[
d \leftarrow x \oplus y
\]

Encoding:

\[
\begin{array}{cccccccc}
1 & 1 & 1 & 1 & . & . & . & . \\
0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 0 & 1 & 1 & 0 & \text{M&R}
\end{array}
\]
XOR4

Bitwise exclusive-or of four words

Produces the bitwise exclusive-or of four words.

The instruction has five operands:

- \( op_1 \) \( d \) Operand register, one of \( r0...r11 \)
- \( op_4 \) \( e \) Operand register, one of \( r0...r11 \)
- \( op_2 \) \( x \) Operand register, one of \( r0...r11 \)
- \( op_3 \) \( y \) Operand register, one of \( r0...r11 \)
- \( op_5 \) \( v \) Operand register, one of \( r0...r11 \)

Mnemonic and operands:

\[ \text{XOR4} \quad d, e, x, y, v \]

Operation:

\[ d \leftarrow x \oplus \text{bit} y \oplus \text{bit} e \oplus \text{bit} v \]

Encoding:

\[
\begin{array}{cccccccccccccccc}
1 & 1 & 1 & 1 & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & M&R
\end{array}
\]

\[
\begin{array}{cccccccccccccccc}
1 & 5 & r & 0 & 0 & 0 & 0 & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & M&R
\end{array}
\]
ZEXT

Zero extend

Zero extends an n-bit field stored in a register. The first operand of this instruction is both a source and destination operand. The second operand contains the bit position. All bits at a position higher or equal are cleared.

The instruction has two operands:

\[ \begin{align*}
\text{op1} & \quad d \quad \text{Operand register, one of } r0...r11 \\
\text{op2} & \quad s \quad \text{Operand register, one of } r0...r11
\end{align*} \]

Mnemonic and operands:

\[ \text{ZEXT} \quad d, s \]

Operation:

\[ d \leftarrow \begin{cases} 
& s \leq 0 \lor s \geq bpw, \quad d \\
& s > 0 \land s < bpw, \quad 0:...:0:d[s-1...0]
\end{cases} \]

Encoding:

\[ \begin{array}{cccccccccccc}
2r & 0 & 1 & 0 & 0 & 0 & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & M+R
\end{array} \]
**ZEXTI**

Zero extend immediate

Zero extends an n-bit field stored in a register. The first operand of this instruction is both a source and destination operand. The second operand contains the bit position. All bits at a position higher or equal are cleared.

The instruction has two operands:

\[ \begin{align*}
  op1 & \quad s & \text{Operand register, one of } r0...r11 \\
  op2 & \quad \text{bitp} & \text{A bit position; one of } bpw, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32
\end{align*} \]

Mnemonic and operands:

\[ \text{ZEXTI } s, \text{bitp} \]

Operation:

\[ s \leftarrow \begin{cases} 
  \text{bitp} & \leq 0 \lor \text{bitp} & \geq \text{bpw}, \quad s \\
  \text{bitp} & > 0 \land \text{bitp} & < \text{bpw}, \quad 0 : \ldots : 0 : s[\text{bitp} - 1 \ldots 0] \\
\end{cases} \]

Encoding:

\[
\begin{array}{l}
\text{rus} \quad 0 1 0 0 0 \cdots \cdots \cdot 1 \cdots \\
\text{M+R}
\end{array}
\]
ZIP

Zips together a pair of registers

Zips a pair registers in bit, bit-pairs, nibbles, bytes or byte-pairs. The granularity of zipping is determined by \( 2^s \). Each of \( d \) and \( e \) are chopped into chunks of \( 2^s \) bits. They are then zipped together by starting with the most significant chunk of \( d \), the most significant chunk of \( e \), then next significant chunk of \( d \) and so on until the least significant chunks of \( e \) and \( d \). This results in a bit string of \( 2 \times bpw \) bits, the most significant \( bpw \) bits are written back to \( d \), the least significant \( bpw \) bits to \( e \).

The instruction has three operands:

\[
\begin{align*}
\text{op1} & \quad d & \text{Operand register, one of} & \quad r0...r11 \\
\text{op2} & \quad e & \text{Operand register, one of} & \quad r0...r11 \\
\text{op3} & \quad s & \text{An integer in the range} & \quad 0...11
\end{align*}
\]

Mnemonic and operands:

\[
\text{ZIP} \quad d, e, s
\]

Operation:

\[
\begin{align*}
z & \leftarrow d[bpw−1..bpw−w−1]: \\
e & \leftarrow e[bpw−1..bpw−w−1]: \\
d & \leftarrow d[bpw−w−1..bpw−2 \times w−1]: \\
e & \leftarrow e[bpw−w−1..bpw−2 \times w−1]: \ldots: \\
d & \leftarrow d[w−1..0]: \\
e & \leftarrow e[w−1..0]: \\
d & \leftarrow z[2bpw−1..bpw] \\
e & \leftarrow z[bpw−1..0]
\end{align*}
\]

Encoding:

\[
\begin{array}{c|cccccccccccc}
1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1
\end{array}
\]

M&R

\[
\begin{array}{c|cccccccccc}
0 & 0 & 0 & 1 & 1 & 1 & 0 & 1 & 1 & 1
\end{array}
\]

M&R
20  XS2 Instruction Format Specification

This section defines the instruction-formats. For each instruction format there is a name, a short description of its purpose, then a graphical representation of the encoding, and finally a list of instructions that use this instruction encoding.

The graphical representation shows the bits of the instruction, bits are numbered from 15 down to 0. If a bit value depends on the opcode, then this is marked with a “×” symbol. If a bit value depends on an operand this is marked with a “·”, and the particular encoding for that operand is shown underneath. Otherwise, the bit will have a value of 0 or 1, in order to differentiate between formats.
Three register

Instructions with three operand registers; the last two operands are always source registers, the first operand is always a destination register.

The syntax for this instruction is:

\[ \text{Mnemonic} \ op_1, op_2, op_3 \]

Instructions in this format are encoded in one word:

\[
\begin{array}{ccccccc}
\times & \times & \times & \times & \cdots & \cdots & \cdots \\
\end{array}
\]

\[ op_3[1...0] \]

\[ op_2[1...0] \]

\[ op_1[1...0] \]

\[ op_1[3...2] \times 9 + op_2[3...2] \times 3 + op_3[3..2] \]

\[ \text{Opcode} \]

This format is used by the following instructions:

- ADD
- LD8U
- LSS
- SHL
- AND
- LD16S
- LSU
- SHR
- EQ
- LDW
- OR
- SUB

REV 1.0
Three register long

Instructions with three operand registers; the last two operands are always source operands, the first operand usually refers to the destination register (with the exception of store instruction)

The syntax for this instruction is:

```
MNEMONIC  op1, op2, op3
```

Instructions in this format are encoded in two words:

```
××××1 1 1 1 1 1 1 1 1 0 ××××
```

- `op1[3...2] × 9 + op2[3...2] × 3 + op3[3..2]`

This format is used by the following instructions:

- ASHR
- LDA16F
- REMS
- STW
- CRC
- LDAWB
- REMU
- TSETR
- DIVS
- LDAWF
- LSATS
- XOR
- DIVU
- MUL
- ST8
- LDA16B
- OUTPW
- ST16
Two register with immediate

Instructions with three operands. The last operand is a small unsigned constant (0..11), the second operand is a source register, the first operand is either a destination register, or a second source register in the case of memory-store operations.

The syntax for this instruction is:

**MNEMONIC**  \( op1, op2, op3 \)

Instructions in this format are encoded in one word:

\[
\begin{array}{c}
\text{××××} \\
\text{××××} \\
\text{××××} \\
\text{××××} \\
\end{array}
\]

\( op3[1...0] \)

\( op2[1...0] \)

\( op1[1...0] \)

\( op1[3...2] \times 9 + op2[3...2] \times 3 + op3[3..2] \)

**Opcode**

This format is used by the following instructions:

ADDI   LDWI   SHRI   SUBI
EQI   SHLI   STWI
Two register with immediate long

Instructions with three operands. The last operand is a small unsigned constant (0..11), the second operand is a source register, the first operand is either a destination register, or a second source register in the case of some resource operations.

The syntax for this instruction is:

```
MNEMONIC op1, op2, op3
```

Instructions in this format are encoded in two words:

```
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
```

This format is used by the following instructions:

- ASHRI
- LDAWF1
- STDSP
- INPW
- LDDSP
- UNZIP
- LDAWBI
- OUTPWI
- ZIP
Register with 6-bit immediate

Instructions with two operands where the first operand is a register and the second operand is a 6-bit integer constant. This format used, amongst others, for load and store operations relative to the stack pointer and data pointer.

The syntax for this instruction is:

```
MNEMONIC op1, op2
```

Instructions in this format are encoded in one word:

```
××××[5...0]××××[3...0]××××
```

This format is used by the following instructions:

- BRBF  LDAWDP  LDWDP  STWSP
- BRBT  LDAWSP  LDWSP
- BRFF  LDC     SETCI
- BRFT  LDWCP   STWDP
Register with 16-bit immediate

Instructions with two operands where the first operand is a register and the second operand is a 16-bit integer constant. This instruction is a prefixed version of . This format is used, amongst others, for load and store operations relative to the stack pointer and data pointer.

The syntax for this instruction is:

MNEUMONIC  \textit{op1, op2}

Instructions in this format are encoded in two words:

\begin{center}
\begin{tabular}{cccccccccccccccc}
\end{tabular}
\end{center}

\begin{center}
\begin{tabular}{cccccccccccccccc}
\end{tabular}
\end{center}

\begin{center}
\begin{tabular}{cccccccccccccccc}
\end{tabular}
\end{center}

\begin{center}
\begin{tabular}{cccccccccccccccc}
\end{tabular}
\end{center}

\begin{center}
\begin{tabular}{cccccccccccccccc}
op2[5...0]
\end{tabular}
\end{center}

\begin{center}
\begin{tabular}{cccccccccccccccc}
op1[3...0]
\end{tabular}
\end{center}

\begin{center}
\begin{tabular}{cccccccccccccccc}
Opcode
\end{tabular}
\end{center}

\begin{center}
\begin{tabular}{cccccccccccccccc}
Opcode
\end{tabular}
\end{center}

This format is used by the following instructions:

- \textit{BRBF}  \textit{LDAWDP}  \textit{LDWDP}  \textit{STWSP}
- \textit{BRBT}  \textit{LDAWSP}  \textit{LDWSP}
- \textit{BRFF}  \textit{LDC}  \textit{SETCI}
- \textit{BRFT}  \textit{LDWCP}  \textit{STWDP}
6-bit immediate

Instructions with a single operand encoding a 6-bit integer.

The syntax for this instruction is:

**Mnemonic**  \( op \)

Instructions in this format are encoded in one word:

\[
\begin{array}{cccccccc}
\times \times \times \times \times \times \times \times \times \\
| \times \times \times \times \times \times \times \\
\end{array}
\]

\[ op1[5...0] \]

\[ Opcode \]

\[ Opcode \]

\[ Opcode \]

This format is used by the following instructions:

- BLAT
- DUALENTSP
- GETSR
- RETSP
- BRBU
- ENTSP
- KCALLI
- SETSR
- BRFU
- EXTDSP
- KENTSP
- CLRSP
- EXTSP
- LDAWCP
16-bit immediate

Instructions with a single operand encoding a 16-bit integer. This instruction is a prefixed version of .

The syntax for this instruction is:

Mnemonic \( op1 \)

Instructions in this format are encoded in two words:

\[
\begin{array}{c}
00\ldots00 \\
\hline
\text{op1[15..6]} \\
\hline
\text{op1[5..0]} \\
\hline
\text{Opcode} \\
\hline
\text{Opcode} \\
\hline
\text{Opcode}
\end{array}
\]

This format is used by the following instructions:

- BLAT
- DUALENTSP
- GETSR
- LDAWCP
- BRBU
- ENTSR
- KCALLI
- RETSR
- BRFU
- EXTDP
- KENTSR
- SETSR
- CLRSPR
- EXTSP
- KRESTSP
10-bit immediate

Instructions with a single operand encoding a 10-bit integer.

The syntax for this instruction is:

```
MNEMONIC  op1

Instructions in this format are encoded in one word:
```

```
×××××××××××××××××

   op1[9...0]

    Opcode

    Opcode
```

This format is used by the following instructions:

- BLACP
- BLRF
- LDAPF
- BLRB
- LDAPB
- LDWCPL
20-bit Immediate

Instructions with a single operand encoding a 20-bit integer. This instruction is a prefixed version of .

The syntax for this instruction is:

Mnemonic \( op1 \)

Instructions in this format are encoded in two words:

\[
\begin{array}{c}
\text{Opcode} \\
\text{Opcode}
\end{array}
\]

\[
\begin{array}{c}
\text{op1}[19...10] \\
\text{op1}[9...0]
\end{array}
\]

This format is used by the following instructions:

- BLACP
- BLRF
- LDAPF
- BLRB
- LDAPB
- LDWCPL
Two register

Instructions with two operand registers; the last operand is always a source register, the first operand maybe a destination register.

The syntax for this instruction is:

\[
\text{MNEMONIC } op_1, op_2
\]

Instructions in this format are encoded in one word:

\[
\begin{array}{c}
\text{Opcode} \\
(\text{op}_1[3...2] \times 3 + \text{op}_2[3...2] + 27) [5] \\
(\text{op}_2[3...2] \times 3 + \text{op}_1[3...2]) \mod 5 + 27
\end{array}
\]

This format is used by the following instructions:

\[
\begin{array}{cccccccc}
\text{ANDNOT} & \text{EET} & \text{INSHR} & \text{PEEK} \\
\text{BITREV} & \text{ENDIN} & \text{INT} & \text{SEXT} \\
\text{BYTEREV} & \text{GETST} & \text{MKMSK} & \text{TESTCT} \\
\text{CHKCT} & \text{GETTS} & \text{NEG} & \text{TESTWCT} \\
\text{CLZ} & \text{IN} & \text{NOT} & \text{ZEXT} \\
\text{EEF} & \text{INCT} & \text{OUTCT}
\end{array}
\]
Two register reversed \( r2r \)

Instructions with two operand registers used for resources; the first operand is always a source register containing the resource to operate on, the last operand maybe a destination register.

The syntax for this instruction is:

\[
\text{Mnemonic } op_1, op_2
\]

Instructions in this format are encoded in one word:

\[
\begin{array}{cccccccc}
\times \times \times \times & \ldots & \ldots & \ldots & \times \times \times \times \\
\end{array}
\]

- \( op_1[1...0] \)
- \( op_2[1...0] \)
- Opcode
- \((op_2[3...2] \times 3 + op_1[3...2] + 27)[5]\)
- \((op_1[3...2] \times 3 + op_2[3...2]) \mod 5 + 27\)

This format is used by the following instructions:

- \text{OUT} \quad \text{OUTT} \quad \text{SETPSC}
- \text{OUTSHR} \quad \text{SETD} \quad \text{SETPT}
Two register long

Instructions with two operand registers; the last operand is always a source register, the first operand maybe a destination register.

The syntax for this instruction is:

Mnemonic \( op1, op2 \)

Instructions in this format are encoded in two words:

\[
\begin{array}{cccccccc}
1 & 1 & 1 & 1 & \cdots & \cdots & \times & \cdots \\
\end{array}
\]

\( op2[1...0] \)

\( op1[1...0] \)

 Opcode

\( (op1[3...2] \times 3 + op2[3...2] + 27)[5] \)

\( (op2[3...2] \times 3 + op1[3...2]) \mod 5 + 27 \)

This format is used by the following instructions:

GETD  SETC  TINITDP  TINITSP
GETN  TESTLCL  TINITLR  TSETMR
GETPS  TINITCP  TINITPC
Two register reversed long

Instructions with two operand registers; the first operand is always a source register containing a resource identifier, the last operand maybe a destination register.

The syntax for this instruction is:

**MNEMONIC**  \( op_1, op_2 \)

Instructions in this format are encoded in two words:

\[
\begin{array}{c}
1 \ 1 \ 1 \ 1 \ \cdots \ \cdots \ \times \ \cdots \\
\end{array}
\]

\( op_1[1...0] \)

\( op_2[1...0] \)

\( Opcode \)

\( (op_2[3...2] \times 3 + op_1[3...2] + 27)[5] \)

\( (op_1[3...2] \times 3 + op_2[3...2]) \mod 5 + 27 \)

This format is used by the following instructions:

- SETCLK
- SETPS
- SETTW
- SETN
- SETRDY
Register with immediate

Instructions with two operands. The last operand is a small constant (0..11). The first operand is a register that may be used as source and or destination.

The syntax for this instruction is:

\[
\text{MNEMONIC } op_1, op_2
\]

Instructions in this format are encoded in one word:

```
xxxxx...x
```

- \(op_2[1...0]\)
- \(op_1[1...0]\)
- Opcode
- \((op_1[3...2] \times 3 + op_2[3...2] + 27)[5]\)
- \((op_2[3...2] \times 3 + op_1[3...2]) \text{ mod } 5 + 27\)
- Opcode

This format is used by the following instructions:

- CHKCTI
- MKMSKI
- SEXTI
- GETR
- OUTCTI
- ZEXTI
Register

Instructions with one operand register.

The syntax for this instruction is:

**MNEMONIC**  *op1*

Instructions in this format are encoded in one word:

```
×××××1 1 1 1 1 1 × · · ·
```

*op1*[3...0]  

**Opcode**  

**Opcode**

This format is used by the following instructions:

- BAU  
- ECALLT  
- KCALL  
- SETSP  
- BLA  
- EDU  
- MJOIN  
- SETV  
- BRU  
- EEU  
- MSYNC  
- SYNCR  
- CLRPT  
- ELATE  
- SETCP  
- TSTART  
- DGETREG  
- FREER  
- SETDP  
- WAITEF  
- ECALLF  
- GETTIME  
- SETEV  
- WAITET
No operands

These instructions operate on implicit operands.

The syntax for this instruction is:

Mnemonic

Instructions in this format are encoded in one word:

```
×××××11111×××
```

Opcode

Opcode

Opcode

This format is used by the following instructions:

- CLRE
- GETID
- LDSPC
- STET
- DCALL
- GETKEP
- LDSSR
- STSED
- FREET
- GETKSP
- NOP
- STSPC
- GETED
- LDET
- SETKEP
- STSSR
- GETET
- LDSED
- SSYNC
- WAITEU
No operands

These instructions operate on implicit operands.

The syntax for this instruction is:

MNEMONIC

Instructions in this format are encoded in two words:

```
1 1 1 1 | 1 1 1 1 1 1 × × × ×  

  Opcode

1 1 1 1 | 1 1 1 1 0 × × × ×  

  Opcode
```

This format is used by the following instructions:

DENTSP  DRESTSP  DRET  KRET
Four register long

Operations on four registers - the last two operands are source registers, the first two may be used as source and or destination registers.

The syntax for this instruction is:

Mnemonic: \( op1, op4, op2, op3 \)

Instructions in this format are encoded in two words:

\[
\begin{align*}
\text{Opcode} & \quad \text{Opcode} \\
\text{Opcode} & \quad \text{Opcode} \\
\text{Opcode} & \quad \text{Opcode} \\
\end{align*}
\]

\( op3[1...0] \)

\( op2[1...0] \)

\( op1[1...0] \)

\( op1[3...2] \times 9 + op2[3...2] \times 3 + op3[3..2] \)

\( op4[3...0] \)

This format is used by the following instructions:

- CRC8
- LDD
- MACCU
- CRCN
- MACCS
- STD
Three register with immediate long

Operations on three registers and an immediate - the third operand is a source register, the first two may be used as source and or destination registers.

The syntax for this instruction is:

**Mnemonic**  \( op1, op4, op2, op3 \)

Instructions in this format are encoded in two words:

\[
\begin{array}{c}
\text{op3[1...0]} \\
\text{op2[1...0]} \\
\text{op1[1...0]} \\
\text{op1[3...2] \times 9 + op2[3...2] \times 3 + op3[3..2]} \\
\end{array}
\]

\[
\begin{array}{c}
\text{op4[3...0]} \\
\text{Opcode} \\
\text{Opcode} \\
\end{array}
\]

This format is used by the following instructions:

LDDI   STDI
Four registers with immediate long

Instruction with five operands. The last operand is a small unsigned constant (0..11), the third and fourth operands are source registers, the first and second operands may be used as source and or destination registers.

The syntax for this instruction is:

```text
MNEMONIC op1, op4, op2, op3, op5
```

Instructions in this format are encoded in two words:

```
1 1 1 1 llllll
    |  |
    |  | op3[1...0]
    |  | op2[1...0]
    |  | op1[1...0]
    |  | op1[3...2] \times 9 + op2[3...2] \times 3 + op3[3..2]
```

```
xxxxxxx|xxxxxxx
    |    | op5[1...0]
    |    | op4[1...0]
    |    | Opcode
    |    | (op4[3...2] \times 3 + op5[3...2] + 27)[5]
    |    | Opcode
    |    | Opcode
```

This format is used by the following instructions:

- CRC32_INC
- LEXTRACT
- LINSERT
Five register long

Operations on five registers - the last three operands are source registers, the first two may be used as source and or destination registers.

The syntax for this instruction is:

Mnemonic: \( op1, op4, op2, op3, op5 \)

Instructions in this format are encoded in two words:

\[
\begin{align*}
\text{Opcode} & \equiv (op4[3...2] \times 3 + op5[3...2] + 27)[5] \\
& \equiv (op5[3...2] \times 3 + op4[3...2]) \mod 5 + 27 \\
\end{align*}
\]

This format is used by the following instructions:

LADD  LDIVU  LSUB  XOR4
Six register long

Operations on six registers - the last four operands are source registers, the first two may be used as source and or destination registers.

The syntax for this instruction is:

MNEMONIC  op1, op4, op2, op3, op5, op6

Instructions in this format are encoded in two words:

\[
\begin{align*}
\text{Opcode} & \quad \text{op}1[3...2 \times 9 + \text{op}2[3...2 \times 3 + \text{op}3[3..2]}
\end{align*}
\]

This format is used by the following instructions:

LMUL
21 XS2 Exceptions

Exceptions change the normal flow of control; they may be caused by interrupts, errors arising during instruction execution and by system calls. On an exception, the processor will save the pc and sr in spc and sse, disable events and interrupts, and start executing an exception handler. The program counter that is saved normally points to the instruction that raised the exception. Two registers are also set. The exception-data (ed) and exception-type (et) will be set to reflect the cause of the exception. The exception handler can choose how to deal with the exception.

In this chapter the different types of exception are listed, together with their representation, their meaning, and the instructions that may cause them.
ET_LINK_ERROR

Ad hardware control token was output to a channel end. Alternatively, a channel end was used to transmit data without its destination being set first.

When ET_LINK_ERROR is raised:

- $et$ will be set to 1.
- $ed$ will be set to the resource ID of the channel end which generated the exception.

This exception may be raised by the following instructions:

```
OUT  OUTCT  OUTT
```
ET_ILLEGAL_PC

The program counter points to a position that could not be accessed, for example, beyond the end of memory, or a non 16-bit aligned memory location.

This exception is raised on dispatch of the instruction corresponding to the illegal program counter. The program counter that is saved in \textit{spc} is the illegal program counter; the memory address of the instruction that caused the program counter to become illegal is not known. Note that this exception could be caused by, for example, loading a resource with an illegal vector (SETV), but that this will not be known until an event happens.

When ET_ILLEGAL_PC is raised:

\begin{itemize}
  \item \textit{et} will be set to 2.
  \item \textit{ed} will be set to the PC which generated the exception.
\end{itemize}

This exception may be raised by the following instructions:

\begin{verbatim}
BAU  BLRF  BRFT  MSYNC
BLA  BRBF  BRFU  TSTART
BLACP BRBT  BRU
BLAT  BRBU  DRET
BLRB  BRFF  KRET
\end{verbatim}
ET_ILLEGAL_INSTRUCTION

A 16-bit/32-bit word was encountered that could not be decoded. This typically indicates that the program counter was incorrect and addresses data memory. Alternatively, a binary is executed that was not compiled for this device.

When ET_ILLEGAL_INSTRUCTION is raised:

▷ $et$ will be set to 3.
▷ $ed$ will be set to 0.

This exception may be raised by the following instructions:

DENTSP  DGETREG  DRESTSP  DRET
ET_ILLEGAL_RESOURCE

A resource operation was performed and failed because either the resource identifier supplied was not a valid resource, it was not allocated, or the operation was not legal on that resource.

When ET_ILLEGAL_RESOURCE is raised:

- $et$ will be set to 4.
- $ed$ will be set to the resource identifier passed to the instruction.

This exception may be raised by the following instructions:

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Parameter 1</th>
<th>Parameter 2</th>
<th>Parameter 3</th>
<th>Parameter 4</th>
</tr>
</thead>
<tbody>
<tr>
<td>CHKCT</td>
<td>IN</td>
<td>PEEK</td>
<td>TESTCT</td>
<td></td>
</tr>
<tr>
<td>CLRPT</td>
<td>INCT</td>
<td>SETC</td>
<td>TESTLCL</td>
<td></td>
</tr>
<tr>
<td>EDU</td>
<td>INPW</td>
<td>SETCLK</td>
<td>TESTWCT</td>
<td></td>
</tr>
<tr>
<td>EEF</td>
<td>INSHR</td>
<td>SETD</td>
<td>TINITCP</td>
<td></td>
</tr>
<tr>
<td>EET</td>
<td>INT</td>
<td>SETEV</td>
<td>TINITDP</td>
<td></td>
</tr>
<tr>
<td>EEU</td>
<td>MJOIN</td>
<td>SETN</td>
<td>TINITLR</td>
<td></td>
</tr>
<tr>
<td>ENDIN</td>
<td>MSYNC</td>
<td>SETPSC</td>
<td>TINITPC</td>
<td></td>
</tr>
<tr>
<td>FREER</td>
<td>OUT</td>
<td>SETPT</td>
<td>TINITSP</td>
<td></td>
</tr>
<tr>
<td>GETD</td>
<td>OUTCT</td>
<td>SETRDY</td>
<td>TSETR</td>
<td></td>
</tr>
<tr>
<td>GETN</td>
<td>OUTPW</td>
<td>SETTW</td>
<td>TSETR</td>
<td></td>
</tr>
<tr>
<td>GETST</td>
<td>OUTSHR</td>
<td>SETV</td>
<td>TSTART</td>
<td></td>
</tr>
<tr>
<td>GETTS</td>
<td>OUTT</td>
<td>SYNCR</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
ET_LOAD_STORE

A memory operation was performed that was not properly aligned. This could be a word load or word store to an address where the least significant $\log_2 Bpw$ bits were not zero, or access to a 16-bit number using LD16S or ST16 where the least significant bit of the address was one.

Many load and store operations multiply their operand by $Bpw$ in order to increase the density of the encoding; even though this part of the address is guaranteed to be aligned, it is possible for one of $sp$, $cp$, or $dp$ to be unaligned, causing any subsequent load or store which uses them to fail.

When ET_LOAD_STORE is raised:

- $et$ will be set to 5.
- $ed$ will be set to the load or store address which generated the exception.

This exception may be raised by the following instructions:

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Operand</th>
<th>Instruction</th>
<th>Operand</th>
<th>Instruction</th>
<th>Operand</th>
</tr>
</thead>
<tbody>
<tr>
<td>BLACP</td>
<td>LDD</td>
<td>LDWCPL</td>
<td>STET</td>
<td></td>
<td></td>
</tr>
<tr>
<td>BLAT</td>
<td>LDDSP</td>
<td>LDWDP</td>
<td>STSED</td>
<td></td>
<td></td>
</tr>
<tr>
<td>DUALENTSP</td>
<td>LDET</td>
<td>LDWSP</td>
<td>STPC</td>
<td></td>
<td></td>
</tr>
<tr>
<td>ENTSP</td>
<td>LDSED</td>
<td>RETSP</td>
<td>STSSR</td>
<td></td>
<td></td>
</tr>
<tr>
<td>KENTSP</td>
<td>LDSPC</td>
<td>ST8</td>
<td>STW</td>
<td></td>
<td></td>
</tr>
<tr>
<td>KRESTSP</td>
<td>LDSSR</td>
<td>ST16</td>
<td>STWDP</td>
<td></td>
<td></td>
</tr>
<tr>
<td>LD8U</td>
<td>LDW</td>
<td>STD</td>
<td>STWSP</td>
<td></td>
<td></td>
</tr>
<tr>
<td>LD16S</td>
<td>LDWCP</td>
<td>STDSP</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
ET_ILLEGAL_PS

Access to a non existent processor status register was requested by either GETPS or SETPS.

When ET_ILLEGAL_PS is raised:

- \( et \) will be set to 6.
- \( ed \) will be set to the processor status register identifier.

This exception may be raised by the following instructions:

GETPS  SETPS
ET_ARITHMETIC

Signals an arithmetic error, for example a division by 0 or an overflow that was detected.

When ET_ARITHMETIC is raised:

- \( et \) will be set to 7.
- \( ed \) will be set to 0.

This exception may be raised by the following instructions:

- DIVS
- LDIVU
- REMU
- DIVU
- REMS
ET_ECALL

An ECALL instruction was executed, and the associated condition caused an exception. Indicates that the application program raised an exception, for example to signal array bound errors or a failed assertion.

When ET_ECALL is raised:

- `et` will be set to 8.
- `ed` will be set to 0.

This exception may be raised by the following instructions:

```plaintext
ECALLF  ECALLT  ELATE
```
ET_RESOURCE_DEP

Resources are owned and used by a single thread. If multiple threads attempt to access the same resource within 4 cycles of each other, a Resource Dependency exception will be raised.

When ET_RESOURCE_DEP is raised:

- $et$ will be set to 9.
- $ed$ will be set to the resource identifier supplied by the instruction.

This exception may be raised by the following instructions:

- CHKCT IN SETC TESTLCL
- CLRPT INCT SETCLK TESTWCT
- EDU INPW SETD TINITCP
- EEF INSHR SETEV TINITDP
- EET INT SETN TINITLR
- EEU MJOIN SETPSC TINITPC
- ENDIN MSYNC SETPT TINITSP
- FREER OUT SETRDY TSETMR
- GETD OUTCT SETTW TSETR
- GETN OUTPW SETV TSTART
- GETST OUTSHR SYNCR
- GETTS OUTT TESTCT
ET_KCALL

Indicates that the KCALL or KCALLI instruction was executed.

When ET_KCALL is raised:

- \( et \) will be set to 15.
- \( ed \) will be set to the kernel call operand.

This exception may be raised by the following instructions:

KCALL
ET_IOLANE

This value is ORed in with any of the previous exception types to indicate that the exception took place in the resource lane.

When ET_IOLANE is raised:

- et will be set to 16.
- N/A

This exception is not related to a specific instruction.
22 XS2 Lanes

When executing in dual-issue mode, instructions are executed in *lanes*. Some instructions can only be executed in a specific lane, other instructions can execute in one of multiple lanes, and yet other instructions required multiple lanes for execution.

In this chapter the different classes of instructions are explained, together with a list of instructions for each.
MEMORY_LANE

In dual issue mode, these instructions can only be executed in the memory lane, indicated by $M$.

Instructions:

<table>
<thead>
<tr>
<th>BAU(16)</th>
<th>BRFU(16)</th>
<th>LDSED(16)</th>
<th>SETDP(16)</th>
</tr>
</thead>
<tbody>
<tr>
<td>BLA(16)</td>
<td>BRU(16)</td>
<td>LDSPC(16)</td>
<td>SETKEP(16)</td>
</tr>
<tr>
<td>BLACP(16)</td>
<td>DGETREG(16)</td>
<td>LDSSR(16)</td>
<td>SETSP(16)</td>
</tr>
<tr>
<td>BLAT(16)</td>
<td>DUALENTSP(16)</td>
<td>LDW(16)</td>
<td>STET(16)</td>
</tr>
<tr>
<td>BLRB(16)</td>
<td>ENTSP(16)</td>
<td>LDW(16)</td>
<td>STSED(16)</td>
</tr>
<tr>
<td>BLRF(16)</td>
<td>KCALL(16)</td>
<td>LDWCP(16)</td>
<td>STSPC(16)</td>
</tr>
<tr>
<td>BRBF(16)</td>
<td>KCALLI(16)</td>
<td>LDWCPL(16)</td>
<td>STSSR(16)</td>
</tr>
<tr>
<td>BRBT(16)</td>
<td>KENTSP(16)</td>
<td>LDWDP(16)</td>
<td>STWI(16)</td>
</tr>
<tr>
<td>BRBU(16)</td>
<td>LD8U(16)</td>
<td>LDWSP(16)</td>
<td>STWDP(16)</td>
</tr>
<tr>
<td>BRFF(16)</td>
<td>LD16S(16)</td>
<td>RETSP(16)</td>
<td>STWSP(16)</td>
</tr>
<tr>
<td>BRFT(16)</td>
<td>LDDET(16)</td>
<td>SETCP(16)</td>
<td></td>
</tr>
</tbody>
</table>
RESOURCE_LANE

In dual issue mode, these instructions can only be executed in the resource lane, indicated by R.

Instructions:

- CHKCT(16)
- CHKCTI(16)
- CLRE(16)
- CLRPT(16)
- CLRSR(16)
- EDU(16)
- EEF(16)
- EET(16)
- EEU(16)
- ENDIN(16)
- FREER(16)
- CHKCT(16)
- GETR(16)
- GETST(16)
- GETTS(16)
- IN(16)
- IN(16)
- INSHR(16)
- INT(16)
- INCT(16)
- MJOIN(16)
- MSYNC(16)
- OUT(16)
- OUT(16)
- OUTCT(16)
- OUTCTI(16)
- OUTSHR(16)
- OUTT(16)
- PEK(16)
- SETC(16)
- SETD(16)
- SETEV(16)
- SETPSC(16)
- SETPT(16)
- SETSR(16)
- SETV(16)
- SSYNC(16)
- SYNC(16)
- TESTCT(16)
- TESTWCT(16)
- TSTART(16)
- WAITEF(16)
- WAITET(16)
- WAITEU(16)
MEMORY_OR_RESOURCE_LANE

In dual issue mode, these instructions can be executed in either lane, indicated by M+R.

Instructions:

- ADD(16)  EQI(16)  LDAWCP(16)  SEXT(16)
- ADDI(16) EXTDP(16) LDAWDP(16) SEXTI(16)
- AND(16)  EXTSP(16) LDAWSP(16)  SHL(16)
- ANDNOT(16) GETED(16) LDC(16)  SHLI(16)
- BITREV(16) GETET(16) LSS(16)  SHR(16)
- BYTEREV(16) GETID(16) LSU(16)  SHRI(16)
- CLZ(16)  GETKEP(16) MKMSK(16)  SUB(16)
- DCALL(16) GETKSP(16) MKMSKI(16)  SUBI(16)
- ECALLF(16) GETSR(16)  NEG(16)  ZEXT(16)
- ECALLT(16) GETTIME(16) NOP(16)  ZEXTI(16)
- ELATE(16) LDAPB(16) NOT(16)
- EQ(16)  LDAPF(16)  OR(16)
**MEMORY_AND_RESOURCE_LANE**

In dual issue mode, these instructions are executed in both lanes simultaneously, indicated by **M&R**.

**Instructions:**

<table>
<thead>
<tr>
<th>Instructions</th>
<th>Mismatched Instructions</th>
</tr>
</thead>
<tbody>
<tr>
<td>ASHR(32)</td>
<td>EXTSP(32)</td>
</tr>
<tr>
<td>ASHR(32)</td>
<td>LDDSP(32)</td>
</tr>
<tr>
<td>ASHR(32)</td>
<td>SETSR(32)</td>
</tr>
<tr>
<td>ASHR(32)</td>
<td>GETD(32)</td>
</tr>
<tr>
<td>ASHR(32)</td>
<td>LDIU(32)</td>
</tr>
<tr>
<td>ASHR(32)</td>
<td>SETT(32)</td>
</tr>
<tr>
<td>BLACP(32)</td>
<td>GETN(32)</td>
</tr>
<tr>
<td>BLACP(32)</td>
<td>LDWCPI(32)</td>
</tr>
<tr>
<td>BLACP(32)</td>
<td>ST8(32)</td>
</tr>
<tr>
<td>BLAT(32)</td>
<td>GETPS(32)</td>
</tr>
<tr>
<td>BLAT(32)</td>
<td>LDWCPL(32)</td>
</tr>
<tr>
<td>BLAT(32)</td>
<td>ST16(32)</td>
</tr>
<tr>
<td>BLRB(32)</td>
<td>GETSR(32)</td>
</tr>
<tr>
<td>BLRB(32)</td>
<td>LDWDP(32)</td>
</tr>
<tr>
<td>BLRB(32)</td>
<td>STD(32)</td>
</tr>
<tr>
<td>BLRF(32)</td>
<td>INPW(32)</td>
</tr>
<tr>
<td>BLRF(32)</td>
<td>LDWS(32)</td>
</tr>
<tr>
<td>BLRF(32)</td>
<td>STD(32)</td>
</tr>
<tr>
<td>BRBF(32)</td>
<td>KCALLI(32)</td>
</tr>
<tr>
<td>BRBF(32)</td>
<td>LEXTRACT(32)</td>
</tr>
<tr>
<td>BRBF(32)</td>
<td>STDSP(32)</td>
</tr>
<tr>
<td>BRBT(32)</td>
<td>KENTSP(32)</td>
</tr>
<tr>
<td>BRBT(32)</td>
<td>LINSERT(32)</td>
</tr>
<tr>
<td>BRBT(32)</td>
<td>STW(32)</td>
</tr>
<tr>
<td>BRBU(32)</td>
<td>KRESTSP(32)</td>
</tr>
<tr>
<td>BRBU(32)</td>
<td>LMUL(32)</td>
</tr>
<tr>
<td>BRBU(32)</td>
<td>STWDP(32)</td>
</tr>
<tr>
<td>BRFF(32)</td>
<td>KRET(32)</td>
</tr>
<tr>
<td>BRFF(32)</td>
<td>LSUB(32)</td>
</tr>
<tr>
<td>BRFF(32)</td>
<td>STWSP(32)</td>
</tr>
<tr>
<td>BRFT(32)</td>
<td>LADD(32)</td>
</tr>
<tr>
<td>BRFT(32)</td>
<td>MACCS(32)</td>
</tr>
<tr>
<td>BRFT(32)</td>
<td>TESTLCL(32)</td>
</tr>
<tr>
<td>BRFU(32)</td>
<td>LDA16B(32)</td>
</tr>
<tr>
<td>BRFU(32)</td>
<td>MACCU(32)</td>
</tr>
<tr>
<td>BRFU(32)</td>
<td>TINITCP(32)</td>
</tr>
<tr>
<td>CLR(32)</td>
<td>LDA16F(32)</td>
</tr>
<tr>
<td>CLR(32)</td>
<td>MUL(32)</td>
</tr>
<tr>
<td>CLR(32)</td>
<td>TINITD(32)</td>
</tr>
<tr>
<td>CRC(32)</td>
<td>LDAPF(32)</td>
</tr>
<tr>
<td>CRC(32)</td>
<td>OUTPW(32)</td>
</tr>
<tr>
<td>CRC(32)</td>
<td>TINITL(32)</td>
</tr>
<tr>
<td>CRC32 (32)</td>
<td>LDAWB(32)</td>
</tr>
<tr>
<td>CRC32 (32)</td>
<td>REMS(32)</td>
</tr>
<tr>
<td>CRC32 (32)</td>
<td>TINITSCP(32)</td>
</tr>
<tr>
<td>CRCN(32)</td>
<td>LDAWB(32)</td>
</tr>
<tr>
<td>CRCN(32)</td>
<td>REMU(32)</td>
</tr>
<tr>
<td>CRCN(32)</td>
<td>TSETMR(32)</td>
</tr>
<tr>
<td>DENTSP(32)</td>
<td>LDAMSP(32)</td>
</tr>
<tr>
<td>DENTSP(32)</td>
<td>RETSP(32)</td>
</tr>
<tr>
<td>DENTSP(32)</td>
<td>TSETR(32)</td>
</tr>
<tr>
<td>DENV(32)</td>
<td>LDAMSP(32)</td>
</tr>
<tr>
<td>DENV(32)</td>
<td>SETC(32)</td>
</tr>
<tr>
<td>DENV(32)</td>
<td>XOR(32)</td>
</tr>
<tr>
<td>DRENTSP(32)</td>
<td>LDAMSP(32)</td>
</tr>
<tr>
<td>DRENTSP(32)</td>
<td>SETCI(32)</td>
</tr>
<tr>
<td>DRENTSP(32)</td>
<td>XOR(32)</td>
</tr>
<tr>
<td>DRET(32)</td>
<td>LDAMSP(32)</td>
</tr>
<tr>
<td>DRET(32)</td>
<td>SETCLK(32)</td>
</tr>
<tr>
<td>DRET(32)</td>
<td>ZIP(32)</td>
</tr>
<tr>
<td>DUALENTSP(32)</td>
<td>LDC(32)</td>
</tr>
<tr>
<td>DUALENTSP(32)</td>
<td>SETN(32)</td>
</tr>
<tr>
<td>DUALENTSP(32)</td>
<td>LDDI(32)</td>
</tr>
<tr>
<td>DUALENTSP(32)</td>
<td>SETPD(32)</td>
</tr>
<tr>
<td>DUALENTSP(32)</td>
<td>LDDI(32)</td>
</tr>
<tr>
<td>DUALENTSP(32)</td>
<td>SETRDY(32)</td>
</tr>
</tbody>
</table>