Fine-grained module-based error recovery in FPGA-based TMR systems

Zhuoran Zhao, Nguyen T.H. Nguyen, Dimitris Agiakatsikas, Ganghee Lee, Ediz Cetin, Oliver Diessel

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Space processing applications deployed on SRAM-based Field Programmable Gate Arrays (FPGAs) are vulnerable to radiation-induced Single Event Upsets (SEUs). Compared with the well-known SEU mitigation solution—Triple Modular Redundancy (TMR) with configuration memory scrubbing—TMR with module-based error recovery (MER) is notably more energy efficient and responsive in repairing soft-errors in the system. Unfortunately, TMR-MER systems also need to resort to scrubbing when errors occur between sub-components, such as in interconnection nets, which are not recovered by MER. This article addresses this problem by proposing a fine-grained module-based error recovery technique, which can localize and correct errors that classic MER fails to do without additional system hardware. We evaluate our proposal via fault-injection campaigns on three types of circuits implemented in Xilinx 7-Series devices. With respect to scrubbing, we observed reductions in the mean time to repair configuration memory errors of between 48.5% and 89.4%, while reductions in energy used recovering from configuration memory errors were estimated at between 77.4% and 96.1%. These improvements result in higher reliability for systems employing TMR with fine-grained reconfiguration than equivalent systems relying on scrubbing for configuration error recovery.
Original languageEnglish
Article number4
Pages (from-to)1-23
Number of pages23
JournalACM Transactions on Reconfigurable Technology and Systems
Volume11
Issue number1
DOIs
Publication statusPublished - Mar 2018

Keywords

  • SRAM FPGA
  • configuration memory errors
  • dynamic reconfiguration
  • mean time to recover
  • partial reconfiguration
  • radiation-induced errors
  • recovery energy
  • reliability

Fingerprint Dive into the research topics of 'Fine-grained module-based error recovery in FPGA-based TMR systems'. Together they form a unique fingerprint.

Cite this