- Suche

- Kontakt

Self-healing distributed systems

Benjamin Satzger
Dissertation, University of Augsburg
Examiner: Prof. Dr. Theo Ungerer
Co-examiner: Prof. Dr. Bernhard Bauer

Abstract

The growing complexity of distributed systems demands for new ways of control. This work addresses self-healing in distributed environments. The term "self-healing" represents a quite new area of research and is used in a fairly broad way, but can be seen as dynamic fault tolerance. This work proposes generic concepts and algorithms to build self-healing systems.

The detection of node failures in distributed environments is a non-trivial problem. Failure detectors are an important component of many fault tolerant distributed systems. In this work a new failure detection algorithm is proposed with noteworthy features like a high flexibility and good performance. Furthermore an approach is presented to save the message overhead of failure detectors.

New grouping algorithms are introduced in this work to enable a scalable self-monitoring property. This allows an autonomous installation of monitoring relations in complex large scale distributed systems.

A failure recovery engine based on automated planning, which manages a distributed system according to user-defined objectives, is proposed. It is able to generate and execute plans to autonomously recover a system from unwanted states.

Finally, ideas for a generic self-healing architecture for highly complex distributed systems are presented. The design is based on psychological and sociological concepts. 

URN: urn:nbn:de:bvb:384-opus-13394
URL: http://opus.bibliothek.uni-augsburg.de/volltexte/2009/1339/

Downloads:

  • PDF  -  (satzger_diss.pdf, 3907 KB)