Augsburg Multicore Task Force

Multicore Research at University of Augsburg

State-of-the-art in Multi-core Processors

Multi-core processors scale performance by putting multiple cores on a single chip, effectively integrating a complete multiprocessor on one chip. Since the total performance of a multi-core is improved without increasing the clock frequency, multi-cores offer a better performance/Watt ratio than a single core solution with similar performance. Moore's Law (which is still going to bring higher transistor density in the coming years) will make it possible to double the number of cores every 18 months. Recently multi-core techniques have been commonly used in embedded high-performance processors (e.g. IBM Cell, Sun Niagara) and by mainstream general-purpose processors (e.g. IBM Power4 and Power5, Intel Core 2, AMD Opteron). Hence, with 4 to 9 cores already on a chip today, we can expect to have soon as many as 16 to 32 cores on a chip. It is even expected to have 256 cores on a chip in ten years from now.

The paradigm shift towards multi-core processors has a profound impact on all aspects of future computing systems. Special-purpose computing nodes will be added to accelerate particular application types (media processing, cryptographic algorithms, digital signal processing etc.) leading to heterogeneous multi-cores. Currently cache-coherent and bus-connected multi-cores prevail. Scaling the number of cores up to dozens or hundreds of cores will have a significant impact on the memory hierarchy of the system. Bus-based interconnects are no longer suitable and will be replaced by NoC (network on chip) technology.

It is clear that this paradigm shift is so profound that it is affecting almost all aspects of system design (from the components of a single core up to the complete system), and that a lot of research and tool development will be needed before it will be possible to bring multi-core processors to the masses. Of particular interest will be software-engineering efforts that target application parallelization for multi-cores. A path from single-core applications to multi-core applications will be required as well as new applications and new application algorithms specifically targeting multi-cores.

What we offer

  • knowledge dissemination on future developments
  • parallel programming and multi-core education in BA/MA study programs
  • application studies for multi-core parallelization
  • virtualization techniques to hide multi-core technology from applications
  • modelling of multi-core systems by theoretical computer science techniques
  • multi-core architecture and system architecture design
  • multi-core deployment in hard-real-time environments (aerospace, automotive, construction machinery, ...)

Research at Chair of Systems and Networking

Prof. Dr. Theo Ungerer

Multicore research at the Chair of Systems and Networking concerns multi-core processor architecture, system software, and system architecture investigations.

CAR-SoC Project (funded by DFG)

CAR-SoC (Connective, Autonomic Real-time)defines a new SoC approach that emphasizes Connectivity, Autonomic computing principles, and Real-time requirements on a chip. The requirements shall be fulfilled by a simultaneous multithreaded processor core called CarCore. The processor core is based on Infineon's TriCore architecture extended by an hardware-integrated hard real-time scheduling. The system software CAROS guarantees isolation of hard real-time threads. It is directed towards the AUTOSAR/OSEC OS of automotive systems. Helper threads running with low priority in own thread slots concurrent to the application implement autonomic managers that monitor relevant on-chip characteristics like processor workload and memory usage. The autonomic managers decide in combination with a middleware developed at University of Karlsruhe if self-optimization, self-configuration, self-protection, or self-healing techniques must be triggered. By funding of the EC Network-of-Excellence HiPEAC the CarCore processor was enhanced by reconfigurable hardware (in cooperation with University of Delft).

For more information visit our project page.

Project partner: Prof. Dr. Uwe Brinkschulte, University of Karlsruhe

MERASA Project (funded by EC FP-7, start Nov. 1, 2007)

The MERASA project will develop multi-core processor designs (from 2 to 16 cores) for hard real-time embedded systems hand in hand with timing analysis techniques and tools to guarantee the analysability and predictability regarding timing of every single feature provided by the processor. Design exploration activities will be performed in conjunction with the timing analysis tools. The project will address both static WCET analysis tools (the OTAWA toolset of University of Toulouse) as well as hybrid measurement-based tools (RapiTime of Rapita Systems) and their interoperability. It will also develop system-level software with predictable timing performance. We investigate hardware-based real-time scheduling solutions that empower the same multi-core processor to handle hard, soft, and non real-time tasks on different cores. The developed hardware/software techniques will be evaluated by application studies from aerospace, automotive, and construction-machinery areas performed by industrial partners.

Project partners: Barcelona Supercomputing Center, University of Toulouse, Rapita Systems, UK, Honeywell spol. s.r.o., Czech Rep.

Cooperation partners: Airbus, European Space Agency, NXP, Infineon, Bauer Maschinen

For more information please visit our project page and the MERASA website.

Self-organization in Multicore Systems

Multicore systems promise a new dimension of processing power and the possibility to integrate previously separated functionality on one die. The Intel corporation recently announced the Teraflop-Chip, a multicore architecture with 80 cores. The integration of the cores reduces the communication latency between the functional units. Therefore the calculations can be distributed between the cores.

New challenges arise for the system software that should schedule task to optimize the utilization of the cores. Furthermore a new paradigm, the Network-on-a-Chip (NoC), has been introduced for the communication of the cores. The ubiquitous bus-architecture will no longer be suitable for multicore systems with an increasing amount of cores.

The research of the Organic Computing group focuses on the two aforementioned topics. The scheduling of tasks on multicore systems and routing algorithms for NoCs, respectively. First investigations using self-organization principles showed excellent results. For detailed information contact Dr. Wolfgang Trumler or Prof. Dr. Theo Ungerer.

Project partner: Prof. Dr. Nader Bagherzadeh, University of California, Irvine

Research at Chair for Multimedia Computing

Prof. Dr. Rainer Lienhart

Parallel Algorithms for Fast Machine Learning

Machine learning applications are the killer-applications of tomorrow. Given the huge amount of data we collect nowadays we need machine intelligence to mine and organize the data automatically on our behalf. However, large data sets require -- in the best case -- long training times, if at all computable in a finite amount of time. Especially the computational complexity of the latest most promising machine learning approaches is more than challenging. Most implementations of these algorithms are designed with a serial model of execution in mind. At the same time, shared-memory based multi-cores are becoming more and more commonplace. The computational power of these machines could be used to solve machine learning problems much faster and in parallel, if we only knew how to properly exploit them. The goal of our research is to reduce training by developing design patterns and strategies for parallelizing machine learning algorithms on multiprocessor computers such that they come scalable in the number of CPUs beyond just 4.

For more information on parallel algorithms for fast machine learning, please contact Rainer Lienhart or see our research overview.

Research at Programming Distributed Systems

Prof. Dr. Bernhard Bauer

Model-Driven Software Development for Multicore Architectures

Multicore processor technologies necessitate a new way of developing software. Current software development techniques need to be overhauled since present applications do not take advantage of the power of multicore capabilities at all. Promising techniques, such as parallelizing compilers, parallel programming languages, concurrency patterns or parallel computing algorithms already exist. However, these techniques are not sufficient for a complete software development process. What is needed is to automate the complete software lifecycle from requirements engineering towards design models to implementation and maintenance.

Our goal is to automate the software development process through a model-driven approach to support the developer in this error-prone process. I.e. starting from high-level requirements the models are refined and automatically transformed until code for multicore technologies is achieved. In particular, this includes:

  • methodologies for designing multicore (parallel) applications
  • model-checking
  • model analysis for automatic parallelization
  • model transformations and code generation
  • model-based testing
  • best practices for automatic parallelization

Research at Chair of Communication Technology

Prof. Dr. Rudi Knorr

Embedded Systems for Communication Applications

In the multicore research focus of the Chair of Communication Technology and the Fraunhofer Society are real-time embedded systems for communication applications in the business fields Automotive and Enterprise.

For the concept and the realisation of the real-time ability of embedded systems with multi-core processors we designed a hypervisor which performs the resource management. The Hypervisor is an intermediate layer between the system platforms and the operating systems. The data acquired at runtime in combination with a prediction of the future term behaviour allow the operating system to carry out an online self-optimisation, thus, e.g., the optimised distribution of the tasks on the respective processor elements depending on the input data. For this purpose extended scheduling mechanisms are necessary.

The Hypervisor as well as the system platform built on its top use the self-description of the SW and HW components to assure the real-time capability. The self-description contains the interfaces and the necessary resources. For the observance of the real-time capability the specifications must be able to be monitored and limited by the system. The determination of the initial values should result from the tool-based SW engineering process. The future goal is that the whole process can be specified, simulated and optimised in the engineering process.

Other subjects are the abstraction of the real processor construction compared with the operating system, the virtualization of different processors and Network-on-Chip-Architectures.

Main focuses

  • Real-time Hypervisor for embedded systems
  • Self-organisation mechanisms for online optimisation between embedded HW and operating system (self-optimisation, self-description)
  • Interaction mechanisms between HW and operating system
  • Monitoring of processes with regard to the runtime behaviour
  • Prediction of the runtime behaviour within single multicore segments
  • Extended mechanisms for scheduling at operating system level
  • Methods and tools for the SW engineering process

For further information contact Prof. Dr. Rudi Knorr or Dr. Markus Zeller.

Research at Theoretical Computer Science

Prof. Dr. Walter Vogler

Formal Multicore Modelling with Petri Nets and Process Algebras

Petri nets and process algebras are well-investigated formal modelling techniques for concurrent activities. The project investigates the usability of Petri nets and process algebras to model hardware and software activities between cores of multicore systems. Both techniques can model specifications as well as implementations of multicore systems. Emphasis lies on:

  • semantics supporting the modular construction of systems
  • partial order semantics for efficient verification
  • efficiency of asynchronous systems
  • parallelisation of applications for multicores via decomposition of Petri nets

Further Readings

  • The HiPEAC Roadmap [K. De Bosschere, W. Luk, X. Martorell, N. Navarro, M. O'Boyle, D. Pnevmatikatos, A. Ramirez, P. Sainrat, A. Seznec, P. Stenstrom, O. Temam: High-Performance Embedded Architecture and Compilation Roadmap. Transactions on High-Performance Embedded Architectures and Compilers, 1(1):5-29, 2007, see URL: http://www.hipeac.net/roadmap].
  • Intel Platform 2015: Intel® Processor and Platform Evolution for the Next Decade [Shekhar Borkar, Pradeep Dubey, Kevin Kahn, David Kuck, Hans Mulder, Steve Pawlowski, Justin Rattner, Technology@Intel Magazine, see URL: http://www.intel.com/technology/magazine/computing/platform-2015-0305.htm]