J. Silc, B. Robic, T. Ungerer: Asynchrony in Parallel Computing: From Dataflow to Multithreading. To be published: Journal of Parallel and Distributed Computing Practices, January/February 1998. (in postscript, in pdf).
This research project was conducted together with Martin Beck and Eberhard
Zehendner of the University of Jena, Germany. The aim was to review hybrid
dataflow techniques due to their instruction issuing techniques and to estimate
whether hybrid dataflow could be a useful enhancement to other implementation
techniques.
The first part of the project was directed towards architectural techniques.
Software simulation of several dataflow architectures were conducted to
compare fine-grain dataflow to several hybrid dataflow techniques: multithreaded
dataflow with direct token recycling as used in Monsoon, multithreaded dataflow
with consecutive execution of the instructions within a thread as used in
the Epsilon processors and in EM-4, dataflow with complex machine operations
as proposed for the SIGMA-1, and large-grain dataflow presuming a RISC processor
respectively a superscalar processor in the execution stage. The results
are presented in:
M. Beck, T. Ungerer, E. Zehendner: Classification and Performance Evaluation of Hybrid Dataflow Techniques With Respect to Matrix Multiplication. Workshop Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware, GI-Fachgruppen 3.1.2 (PARS) und ITG, Dresden, April 6-8., 1993, PARS-Mitteilungen, Vol. 12, 118 - 126. (in postscript, in pdf)
The project proceeded towards software partitioning methods for dataflow graphs. A partitioning algorithm for dependence graphs that uses a heuristic to determine a cost-efficient solution based on an architecture-dependent cost function was developed. Performance tuning of non-blocking threads is based on graph partitioning algorithms that create serial code blocks from dependence graphs. Previously existing algorithms were directed toward deadlock-avoidance and maximization of run-length. The latter criterion often generates a high synchronization overhead. This research established a partitioning algorithm for dependence graphs that uses a heuristic to determine a cost-efficient solution based on an architecture-dependent cost function. Empirical results are based on benchmark programs that were compiled with MIT's Id compiler, extended by our architecture-dependent partitioning algorithm. The results demonstrate a reduction in software overhead with our architecture-dependent partitioning algorithm, compared with previously existing partitioning methods. The execution of the sample programs on an emulator for the Monsoon dataflow architecture showed a reduced number of processor cycles.
The project ended with the doctoral dissertation of Martin Beck. Results are presented in:
M. Beck, T. Ungerer, E. Zehendner: Architectural Partitioning of Dependence
Graphs. To be published: 6th EUROMICRO Workshop on Parallel and Distributed
Processing, Madrid, January 21-23, 1998.
Paper published in workshop proceedings, 7 pages (in
postscript, in pdf).
Slightly longer version (in postscript,
in pdf).
M. Beck, T. Ungerer, E. Zehendner: Architectural Partitioning of Dependence Graphs. Berichte zur Rechnerarchitektur 3, 23, Friedrich-Schiller-Universität Jena, 1997.
M. Beck:Architekturabhängige Partitionierung von Datenflußgraphen. Dissertation. Friedrich-Schiller-Universität Jena, 1997.
The block-multithreaded Rhamma processor
A multithreaded superscalar processor