Reduced Complexity Many-Core

Start date: 01.07.2014
Funded by: Universität Augsburg
Local project leader: Prof. Dr. Theo Ungerer
Local scientists: Martin Frieb
Jörg Mische
Publications: Publication list


Avoiding components with a high power consumption is one possibility to reduce the overall power consumption of a processor. Speculative components, like the branch prediction or cache memories are examples for such expensive modules. Speculation is also bad for the predictability of the timing behaviour, since it increases the pessimism. Therefore low power technology and real-time capabilities can be easily united.

Resigning speculation severely reduces the processor performance. The low single thread performance can be compensated by increasing the number of processor cores and concurrent threads. Hence we develop a Reduced Complexity Many Core with small cores and a simple interconnection network that achieves a high throughput by massive parallelism.

Many-Core Timing Analysis

The Scalability of shared-memory multicore processors is limited. Especially at static timing analysis, interferences lead to pessimistic worst-case assumptions and overestimation. For example, worst-case memory access latencies increase with the number of cores. To overcome this, RC/MC prohibits interfering memory accesses and isolates cores. This means there is only core-local memory, no global shared memory. The only way to communicate with other cores is fine-grained message passing (FGMP) over a predictable network-on-chip. [Composed Parallel Operations (CPOs)] are used to analyse the timing.

PaterNoster Network-on-Chip

State-of-the-art Network on Chips (NoCs) provide a high throughput and low latency by sending packets of data through a mesh topology, using virtual channels and wormhole flow control. The downside of this technology is a high area and energy consumption due to many buffers, large crossbars and a complex arbitration logic within the routers.
The PaterNoster approach simplifies the hardware by sending only small messages that can be transported in one clock cycle. In addition to the area reduction, the simple routing algorithm (XY-routing in a 2D-torus) increases predictability, enabling a tighter timing analyis.