Fault-Tolerance Mechanisms in the SB-PRAM Multiprocessor

Braun, Michael; Grävinghoff, Andreas; Keller, Jörg GND

The SB-PRAM is an experimental multiprocessor architecture with a shared address space and synchronously running threads, i.e. giving the illusion to work on a PRAM. A 4-processor prototype has been completed while a 64-processor prototype is under construction. We investigate the detection and handling of single bit errors occuring during transmission of packets in the interconnection network. We analyze the impact of an error on the different parts of a packet and derive several strategies to recover from such an error. The strategies range from single bit correction codes to checkpointing the application and roll back in case of error. We find that the changes necessary in hard- and system software are small. In particular, none of the ASICs designed for the SB-PRAM have to be changed. The runtime overhead due to the fault-tolerance mechanisms can be neglected. Finally, we sketch how these strategies can be extended to cover component failures.

Vorschau

Zitieren

Zitierform:

Braun, Michael / Grävinghoff, Andreas / Keller, Jörg: Fault-Tolerance Mechanisms in the SB-PRAM Multiprocessor. Hagen 1997. FernUniversität in Hagen.

Zugriffsstatistik

Gesamt

Volltextzugriffe:
Metadatenansicht:

12 Monate

Volltextzugriffe:
Metadatenansicht:

Rechte

Nutzung und Vervielfältigung:
Alle Rechte vorbehalten

Export

powered by MyCoRe