TY - JOUR
T1 - Reconfiguration Control Networks for FPGA-based TMR systems with modular error recovery
AU - Nguyen, Nguyen T. H.
AU - Agiakatsikas, Dimitris
AU - Zhao, Zhuoran
AU - Wu, Tong
AU - Cetin, Ediz
AU - Diessel, Oliver
AU - Gong, Lingkan
PY - 2018
Y1 - 2018
N2 - Abstract Field-Programmable Gate Arrays (FPGAs) provide ideal platforms for meeting the computational requirements of future space-based processing systems. However, FPGAs are susceptible to radiation-induced Single Event Upsets (SEUs). Techniques for dynamically reconfiguring corrupted modules of Triple Modular Redundant(TMR) components are well known. However, most of these techniques utilize resources that are themselves susceptible to SEUs to transfer reconfiguration requests from the TMR voters to a central reconfiguration controller. This paper evaluates the impact of these Reconfiguration Control Networks (RCNs) on the system's reliability and performance. We provide an overview of RCNs reported in the literature and compare them in terms of dependability, scalability and performance. Most importantly, we compare the performance of soft networks with that of a hard network that utilizes the Internal Configuration Access Port(ICAP) available in advanced Xilinx devices to periodically read the TMR voter states. We have implemented our designs on a Xilinx Artix-7 FPGA to assess the resulting resource utilization and performance as well as to evaluate their soft error vulnerability using analytical and fault injection techniques. Results show that, of the RCN topologies studied, the ICAP-based approach is the most reliable despite having the highest network latency. We also conclude that a module-based recovery approach is less reliable than scrubbing unless the RCN is implemented with redundancy and repaired when it suffers from configuration memory errors.
AB - Abstract Field-Programmable Gate Arrays (FPGAs) provide ideal platforms for meeting the computational requirements of future space-based processing systems. However, FPGAs are susceptible to radiation-induced Single Event Upsets (SEUs). Techniques for dynamically reconfiguring corrupted modules of Triple Modular Redundant(TMR) components are well known. However, most of these techniques utilize resources that are themselves susceptible to SEUs to transfer reconfiguration requests from the TMR voters to a central reconfiguration controller. This paper evaluates the impact of these Reconfiguration Control Networks (RCNs) on the system's reliability and performance. We provide an overview of RCNs reported in the literature and compare them in terms of dependability, scalability and performance. Most importantly, we compare the performance of soft networks with that of a hard network that utilizes the Internal Configuration Access Port(ICAP) available in advanced Xilinx devices to periodically read the TMR voter states. We have implemented our designs on a Xilinx Artix-7 FPGA to assess the resulting resource utilization and performance as well as to evaluate their soft error vulnerability using analytical and fault injection techniques. Results show that, of the RCN topologies studied, the ICAP-based approach is the most reliable despite having the highest network latency. We also conclude that a module-based recovery approach is less reliable than scrubbing unless the RCN is implemented with redundancy and repaired when it suffers from configuration memory errors.
KW - Reconfiguration Control Networks
KW - Dynamic Partial Reconfiguration
KW - SRAM-based FPGA
KW - Fault injection
KW - Radiation effects
KW - Scrubbing
KW - Single Event Upsets
KW - Triple Modular Redundancy
KW - Reliability
UR - http://www.scopus.com/inward/record.url?scp=85046168421&partnerID=8YFLogxK
UR - http://purl.org/au-research/grants/arc/LP140100328
UR - http://purl.org/au-research/grants/arc/DP150103866
U2 - 10.1016/j.micpro.2018.04.006
DO - 10.1016/j.micpro.2018.04.006
M3 - Article
SN - 0141-9331
VL - 60
SP - 86
EP - 95
JO - Microprocessors and Microsystems
JF - Microprocessors and Microsystems
ER -