Scheduling configuration memory error checks to improve the reliability of FPGA-based systems

Nguyen Tran Huu Nguyen, Ediz Cetin, Oliver Diessel

Research output: Contribution to journalArticlepeer-review

Abstract

Field-programmable gate arrays are susceptible to radiation-induced single event upsets. These are commonly dealt with using triple modular redundancy (TMR) and module-based configuration memory error recovery (MER). By triplicating components and voting on their outputs, TMR helps localise configuration memory errors, and by reconfiguring faulty components, MER swiftly corrects them. However, the order in which TMR voters are checked inevitably impacts the overall system reliability. In this study, the authors outline an approach for computing the reliability of TMR-MER systems that consist of finitely many components. They demonstrate that system reliability is improved when the more vulnerable components are checked more frequently than when they are checked in round-robin order. They propose a genetic algorithm for finding a voter checking schedule that maximises the reliability of TMR-MER systems. Results indicate that the mean time to failure (MTTF) of these systems can be increased by up to 400% when variable-rate voter checking (VRVC) is used instead of round robin. They show that VRVC achieves 15-23% increase in MTTF with a 10× reduction in checking frequency to reduce system power. They also found that VRVC detects errors 44% faster on average than round robin.
Original languageEnglish
Pages (from-to)154–165
Number of pages12
JournalIET Computers and Digital Techniques
Volume13
Issue number3
Early online date21 Nov 2018
DOIs
Publication statusPublished - May 2019

Fingerprint

Dive into the research topics of 'Scheduling configuration memory error checks to improve the reliability of FPGA-based systems'. Together they form a unique fingerprint.

Cite this