## Embedded Systems Group (ES)

This is an instruction set simulator for a SCAD processor that has been designed in the Embedded Systems Group of the University of Kaiserslautern (see more details about the simulated SCAD machine below).

 // *************************************************************************** // The following SCAD program computes the factorial of a number specified in // line 13 (it is currently \$6). To compute the factorial of n, the program // requires 13n+12 slow steps, 5n+4 fast steps, 15n+12 firings, and 5n+1 data // transports. To explain the program, let's consider two variables m and i // where m is the output of a multiplier pu0 and i is the output of an adder // pu1. // *************************************************************************** // initialize m:=1 ------------------------------------------------- 0: \$1 -> pu0@in0 // left operand for multiplication is 1 1: \$1 -> pu0@in1 // right operand for multiplication is 1 2: (mulN,1) -> pu0@opc // compute m:=1*1 // initialize i:=1 ------------------------------------------------- 3: \$0 -> pu1@in0 // left operand for incrementation is 0 4: \$1 -> pu1@in1 // right operand for incrementation is 1 5: (addN,3) -> pu1@opc // compute i:=0+1 // loop starts here ------------------------------------------------ // multiply operation m:=m*i --------------------------------------- 6: pu0@out -> pu0@in0 // left operand for multiplication is m 7: pu1@out -> pu0@in1 // right operand for multiplication is i 8: (mulN,1) -> pu0@opc // compute m:=m*i // increment operation i:=i+1 -------------------------------------- 9: pu1@out -> pu1@in0 // left operand for incrementation is i 10: \$1 -> pu1@in1 // right operand for incrementation is 1 11: (addN,3) -> pu1@opc // compute i:=i+1 // check loop bound ------------------------------------------------- 12: pu1@out -> pu2@in0 // left operand for loop condition is i 13: \$6 -> pu2@in1 // right operand for loop condition is \$6 14: (lesN,1) -> pu2@opc // compute b:=i<6 // branch of loop --------------------------------------------------- 15: \$6 -> cu@in1 // pc if branch taken is 6 16: pu2@out -> cu@in0 // pc condition is b:=pu2@out // loop ends here --------------------------------------------------- 17: st -> lsu@opc // after termination, store the result 18: \$0 -> lsu@in0 // store address is memory address 0 19: pu0@out -> lsu@in1 // data value comes from m=pu0@out // *************************************************************************** buffer size: memory size: mode: slow mode fast mode

In general, SCAD (synchronous control/asynchronous data flow) processors consist of some number of processing units whose input and output ports are buffered. Each processing unit has a unique address adrU, and each one of its buffers extends this address to a unique buffer address like adrU@in0,adrU@out, or adrU@opc. Whenever a processing unit finds operands in its input buffers, it fires which means it consumes the input values and produces a specified number of copies of output values in its output buffer.

A program for a SCAD machine consists only of move instructions which move either a direct value or the result value of an output buffer to an input buffer. SCAD machines can have application-specific processing units, but the one considered here has only universal processing units with the following addresses:

• Address 0 is used for the control unit cu.
• Address 2 is used for the reordering unit rob.
• Addresses 3..N+2 are used for the N universal processing units pu0,...,pu{N-1}.
Note that these addresses are used in the printout of the simulator. The behavior of these functional units are described in more detail below.

## Control Unit

The control unit cu fetches move instructions from the program memory. To this end, it maintains a local program counter pc that is used to address the program memory. Whenever firing, the cu reads an instruction from ProgMem[pc] and issues it to the other processing units that add then addresses or values to their input or output buffers.

The control unit is also responsible for branch and jump instructions, i.e., conditional and unconditional branches. Unconditional branches are simply encoded as move instructions to the special address pc, i.e., \$l->pc sets the pc in the control unit to l. Conditional branch instructions are handled by three lanes in the input buffer of the control unit:

• cu@in0 is the branch condition lane.
• cu@in1 is the `then' branch target lane.
• cu@in2 is the `else' branch target lane.
As long as a pc is available locally in the cu, the corresponding move instruction is fetched and issued, and the local pc is incremented or set to l in case of an unconditional branch \$l->pc. Conditional branch instructions should be implemented by the program as follows:
• First move the branch target address to cu@in1.
• Next, issue move instructions to compute the branch condition.
• Finally, issue a move instruction from the output buffer where the final value of the branch condition has been produced to cu@in0. If this instruction is issued by the cu, the cu will destroy its local pc, and will automatically move pc+1 to cu@in2.
• The cu can fire if it has a local pc or if the three input lanes contain defined fields. In the latter case, a new local pc is defined, otherwise, instructions are fetched using the existing local pc.
The cu stops issuing further instructions as soon as the pc is outside the allowed addresses of the program memory. Appropriate move instructions concerning the cu are therefore the following:
• \$l -> pc sets the pc to l (unconditional branch).
• \$pcThen -> cu@in1 moves a branch target pcThen to cu@in1.
• adrU@out -> cu@in0 move the result of the branch condition evaluation found in adrU@out to cu@in0, and writes pc+1 to cu@in2.
Note that there are no moves from the cu. Instead, the cu issues move instructions whenever it fires on the move instruction network.

The lsu is the load/store unit of the SCAD machine with the following lanes:

• lsu@in1 holds the values to be stored (and a dummy value for loads).
• lsu@opc holds a pair (ls,nc) where ls is the opcode and ns is number of copies to be written to lsu@out in case of a load instruction. The opcode ls is true for load and false for store instructions.
• lsu@out is the output buffer that stores the values produced by load operations.
Appropriate move instructions concerning the lsu are therefore the following:
• \$val -> lsu@in1 moves a data value val to lsu@in1.
• st -> lsu@opc moves the opcode for storing to lsu@opc.
• (ld,nc) -> lsu@opc moves the opcode for loading and generating nc copies of the loaded value to lsu@out.

## Reorder Unit

If result values are not found in the desired order in the output buffers, one can make use of the reorder unit rob. The reorder unit rob has one one input lane rob@in0 and one output lane rob@out. It fires whenever there is a value in the input lane, and simply copies that value to the output lane.

## Processing Units

The considered SCAD machine has universal processing units pu0,...,puN for some number N that is determined by the given SCAD program. Each processing unit puI has one input and one output buffer with the following lanes:

• puI@in0 holds the left operand of a binary operation.
• puI@in1 holds the right operand of a binary operation.
• puI@opc holds pairs (opc,nc) where opc is the opcode of a binary operation and ns is number of copies to be written to puI@out.
• puI@out holds the result values.
All processing units pu0,...,puN are capable to execute the following operations:
• subN : unsigned subtraction
• mulN : unsigned multiplication
• divN : unsigned division
• modN : unsigned modulo
• lesN : unsigned less comparison
• leqN : unsigned less or equal comparison
• eqqN : unsigned equality comparison
• neqN : unsigned inequality comparison
• subZ : signed subtraction
• mulZ : signed multiplication
• divZ : signed division
• modZ : signed modulo
• lesZ : signed less comparison
• leqZ : signed less or equal comparison
• eqqZ : signed equality comparison
• neqZ : signed inequality comparison
• andB : bitwise conjunction
• orB : bitwise disjunction
• eqqB : bitwise equivalence
• neqB : bitwise negation
Each puI registers the move instructions issued by the control unit whenever the address is either the source or target address of a move instruction. Moreover, result values are produced whenever input operands and opcodes are available and there is enough space in the output buffer. Finally, available result values are sent to input buffers whenever the target addresses are also available. Appropriate move instructions concerning processing units are therefore the following:
• \$val -> puI@in0 or \$val -> puI@in1 move a direct operand val to an input buffer.
• adrU@out -> puI@in0 or adrU@out -> puI@in1 move a value from buffer adrU@out to an input buffer.
• (opc,nc) -> puI@opc moves an opcode opc and the number of copies nc to the opcode buffer.
• puI@out -> adrU@inI moves a result value from the output buffer to some other input buffer.