Search: (more)
Header

Parallel Processor Architectures

For the implementation of an embedded system, one has to select a suitable hardware platform so that the synthesis of the given model to that platform can be performed. For this purpose, many platforms are available today, including at least the following ones:

  • conventional microprocessors with or without pipelining,
  • dynamically scheduled microprocessors (with out-of-order execution),
  • statically scheduled microprocessors (like DSPs and VLIW processors),
  • graphic processors (GPUs) that became generally programmable,
  • application-specific processors (e.g. adapting the instruction sets),
  • and freely configurable processing units like FPGAs.

Since digital hardware circuits are also based on the synchronous model of computation, it is straightforward to synthesis digital hardware circuits from synchronous guarded actions used by our Averest design framework.

Clearly, our Averest system also offers translations from the internal representation by guarded actions to single-threaded software. To this end, one may either generate an equation system (similar to HW synthesis but without bit-blasting) that is then implemented in an infinite loop in the software thread. To improve the reaction time, it is often advantageous to generate a so-called extended finite state machine (EFSM) that explicitly enumerates the control flow of the system, but retains the data flow in a symbolic form. This way, only the code enabled by particular control flow states is evaluated at runtime instead of the entire code of the program. While the size of the generated code may grow exponentially (but rarely does so in practice), the runtime is typically improved a lot.

In our previous work, we already considered code generation for GPUs and VLIW processors. We are currently implementing a family of processors for teaching and research. Using a single instruction set called Abacus that is similar to the MIPS architecture, we implement

  • Abacus-Spec: the instruction set behavior of
  • Abacus-P: a pipelined implementation of Abacus
  • Abacus-V: a pipelined implementation of Abacus including the vector instructions
  • Abacus-OOO: a superscalar implementation of Abacus with dynamic scheduling
In addition, we implement caches and cache coherence protocols so that one can generate also multicore processor models of these architectures. The final goal is then a HW/SW-codesign approach where an application-specific version of Abacus is generated in that the number of specialized functional units is adapted to the needs of the modeled embedded system.