# Reliability Measurement of FPGA Implementations on Software-Defined Radio Platforms

### Presented by Dr. Armando Astarloa

APERT 2013

12 June 2013



## Index



## Introduction and Motivation

- Reliability in Electronic Devices (hardware)
- FPGAs nowadays
- COTS FPGAs on critical applications
- FT and EM techniques in FPGAs

### 2 Experimental Acquisition of $\lambda$ (MTBF)

- SEU Test Set-up
- Analysis Results for Cipher Blocks
- Analysis Results for on-chip bus Architectures



### Hardware for Reliable SDR

Reliability in Electronic Devices (hardware) FPGAs nowadays COTS FPGAs on critical applications FT and EM techniques in FPGAs



## Introduction and Motivation

- Reliability in Electronic Devices (hardware)
- FPGAs nowadays
- COTS FPGAs on critical applications
- FT and EM techniques in FPGAs

### 2 Experimental Acquisition of $\lambda$ (MTBF)

- SEU Test Set-up
- Analysis Results for Cipher Blocks
- Analysis Results for on-chip bus Architectures



Reliability in Electronic Devices (hardware) FPGAs nowadays COTS FPGAs on critical applications FT and EM techniques in FPGAs

# Reliability in Electronic Devices (hardware)

### ( $\lambda$ ): Device's failure rate.

Failure In Time (FIT) rate: Number of failures that can be expected in  $10^9$  device-hours of operation.





Figure: The Bathtub curve [1].



Reliability in Electronic Devices (hardware) FPGAs nowadays COTS FPGAs on critical applications FT and EM techniques in FPGAs

### Reliability in Electronic Devices (hardware) Parameters

Reliability (R(t)): The probability of an electronic system to perform its required functions under stated conditions for a specified period of time.  $R(t_1)$  is the probability of a system will not fail in time  $t_1$ .

$$R(t) = e^{-\lambda t} \tag{1}$$

Mean Time Between Failures (MTBF): Average time a system runs between failure.

$$MTBF = \int_0^\infty e^{-\lambda t} dt = \frac{1}{\lambda}$$
(2)

Reliability in Electronic Devices (hardware) FPGAs nowadays COTS FPGAs on critical applications FT and EM techniques in FPGAs

# Reliability in Electronic Devices (hardware)

Temporal errors in electronic devices

### Single Event Effects (SEEs):

- Heavy ion strikes the silicon
- It loses its energy via the production of free electron hole pairs
- A dense ionized track in the local region is generated
- A bit-flip can be induced (SEU, SET, SEFI, MBU, SEL -permanent-)



Figure: An ion passing through a transistor



Reliability in Electronic Devices (hardware) FPGAs nowadays COTS FPGAs on critical applications FT and EM techniques in FPGAs

### Reliability in Electronic Devices (hardware) Fault tolerance techniques in electronic devices

### Redundancy:

- **Physical redundancy:** Plain design, Tripe Modular Redundancy with simple voter, Partitioned Tripe Modular Redundancy with multiple simple voters, XTMR.
- **Temporal redundancy:** Same processing  $\Delta(t)$  and further comparison.



Figure: Tripe Modular Redundancy with simple voter.



Reliability in Electronic Devices (hardware) FPGAs nowadays COTS FPGAs on critical applications FT and EM techniques in FPGAs

## Reliability in Electronic Devices (hardware) Standard for Functional Safety and Certifications

- Electronic products certification for critical applications
- Analysis and development methods
- Safety integrity levels: SIL1-SIL4 (IEC 61508)
- Two modes of operation: High demand rate and Low demand rate





Introduction and Motivation Experimental Acquisition of λ (MTBF) Hardware for Reliable SDR Conclusions Reliability in Electronic Devices (hardware) FPGAs nowadays COTS FPGAs on critical applications FT and EM techniques in FPGAs

## Reliability in Electronic Devices (hardware) Safety integrity levels

Table: IEC-61508 Safety integrity Level for high demand rate systems (*dangerousfailures/hr*)

| Safety integrity Level | $\lambda$ required              |
|------------------------|---------------------------------|
| SIL4                   | between $10^{-8}$ and $10^{-9}$ |
| SIL3                   | between $10^{-7}$ and $10^{-8}$ |
| SIL2                   | between $10^{-6}$ and $10^{-7}$ |
| SIL1                   | between $10^{-5}$ and $10^{-6}$ |





Reliability in Electronic Devices (hardware) FPGAs nowadays COTS FPGAs on critical applications FT and EM techniques in FPGAs

### FPGAs nowadays The electronic design is living a 'revolution'

### Current status:

- New silicon technologies allow the integration of whole digital systems in a single 'chip'
- The design of Application Specific Integrated Circuits is more and more expensive, complex and risky





Reliability in Electronic Devices (hardware) FPGAs nowadays COTS FPGAs on critical applications FT and EM techniques in FPGAs

### FPGAs nowadays Answer: Reconfigurable logic (FPGAs)

### What is a FPGA?:

- It is similar to a 'un-formatted chip'
- It can be 'load' with a custom circuit designed using specific EDA tools
- Modern FPGAs offer similar features to ASICs





#### Introduction and Motivation

Experimental Acquisition of  $\lambda$  (MTBF) Hardware for Reliable SDR Conclusions

### FPGAs nowadays FPGAs trade-off

#### Success keys:

- Affordable engineering
- Low risk: Reconfigurable
- Low product time-to-market
- Flexible electronic boards (they can be reused for different products)

Reliability in Electronic Devices (hardware) FPGAs nowadays COTS FPGAs on critical applications FT and EM techniques in FPGAs

#### Challenges:

- Susceptible to SEE (SEUs)
- Programmability (Safety standards)
- Example:
  - Virtex-5 nominal 131 FIT/Mb configuration cells [2]
  - XC5VLX50T 11,37 Mb of configuration cells > 1489 FIT nominal or an MTBF of 77 years





Introduction and Motivation ental Acquisition of λ (MTBF) Hardware for Reliable SDR

Conclusions

Reliability in Electronic Devices (hardware) FPGAs nowadays COTS FPGAs on critical applications FT and EM techniques in FPGAs

### COTS FPGAs on critical applications NASA SpaceCube (source [3])



Figure: SPACE cube 1.0 CPU.



Figure: STS-125 launch.



Figure: California wildfire scene



Introduction and Motivation perimental Acquisition of  $\lambda$  (MTBF) Hardware for Reliable SDR Conclusions Reliability in Electronic Devices (hardware) FPGAs nowadays COTS FPGAs on critical applications FT and EM techniques in FPGAs

### FT and EM techniques in FPGAs Temporal errors in FPGAs



### Figure: FPGA architecture.



### Figure: SEU effect in FPGAs.



Introduction and Motivation Experimental Acquisition of  $\lambda$  (MTBF) Hardware for Reliable SDR

Conclusions

Reliability in Electronic Devices (hardware FPGAs nowadays COTS FPGAs on critical applications FT and EM techniques in FPGAs

SoCe

# FT and EM techniques in FPGAs



Figure: Tripe Modular Redundancy with simple voter in a FPGA. [4]

Reliability in Electronic Devices (hardware) FPGAs nowadays COTS FPGAs on critical applications FT and EM techniques in FPGAs

# FT and EM techniques in FPGAs

# XTMR architecture - XTMR triplicates:

- All inputs including clocks and throughput (combinational) logic.
- Feedback logic and inserting majority voters on feedback paths.
- All outputs, using minority voters to detect and disable incorrect output paths.



Figure: XTMR architecture.



Reliability in Electronic Devices (hardware) FPGAs nowadays COTS FPGAs on critical applications FT and EM techniques in FPGAs

APER

# FT and EM techniques in FPGAs

### XTMR design flow:



### Figure: XTMR design flow.

Introduction and Motivation Experimental Acquisition of λ (MTBF) Hardware for Reliable SDR Conclusions Reliability in Electronic Devices (hardware) FPGAs nowadays COTS FPGAs on critical applications FT and EM techniques in FPGAs

# FT and EM techniques in FPGAs

### Scrubbing:



Figure: Scrubbing concept.



Reliability in Electronic Devices (hardware) FPGAs nowadays COTS FPGAs on critical applications FT and EM techniques in FPGAs

# FT and EM techniques in FPGAs

# Internal Scrubbing combined with Error Correcting Code (ECC):



Figure: SEU controller macro in Virtex-5 [5].



Reliability in Electronic Devices (hardware) FPGAs nowadays COTS FPGAs on critical applications FT and EM techniques in FPGAs

SoC

# FT and EM techniques in FPGAs

### Block RAMs Scrubbing (Internal RAM memory):



Figure: BlockRAM scrubber macro block diagram [6].

Reliability in Electronic Devices (hardware) FPGAs nowadays COTS FPGAs on critical applications FT and EM techniques in FPGAs

### FT and EM techniques in FPGAs Partial Reconfiguration for Fault Tolerance

- Partial reconfiguration is supported by Xilinx
- Technological challenges overcome (Virtex-4):
  - Internal reconfiguration ports (ICAP)
  - No frame restriction (16-CLB range)
  - No SRL16 and LUT RAMs limitations
- RPD Design flow enhanced





SEU Test Set-up Analysis Results for Cipher Blocks Analysis Results for on-chip bus Architectures

### Introduction and Motiva

- Reliability in Electronic Devices (hardware)
- FPGAs nowadays
- COTS FPGAs on critical applications
- FT and EM techniques in FPGAs

### 2 Experimental Acquisition of $\lambda$ (MTBF)

- SEU Test Set-up
- Analysis Results for Cipher Blocks
- Analysis Results for on-chip bus Architectures



Introduction and Motivation Experimental Acquisition of  $\lambda$  (MTBF) Hardware for Reliable SDR Conclusions

**SEU Test Set-up** Analysis Results for Cipher Blocks Analysis Results for on-chip bus Architectures

# SEU Test Set-up





Introduction and Motivation Experimental Acquisition of A (MTBF) Hardware for Reliable SDR Conclusions

SEU Test Set-up **Analysis Results for Cipher Blocks** Analysis Results for on-chip bus Architectures

### Analysis Results for Cipher Blocks Test Flow



Introduction and Motivation Experimental Acquisition of  $\lambda$  (MTBF) Hardware for Reliable SDR Conclusions

SEU Test Set-up **Analysis Results for Cipher Blocks** Analysis Results for on-chip bus Architectures

### Analysis Results for Cipher Blocks Analysis Results





Introduction and Motivation Experimental Acquisition of  $\lambda$  (MTBF) Hardware for Reliable SDR Conclusions

SEU Test Set-up Analysis Results for Cipher Blocks Analysis Results for on-chip bus Architectures

# Analysis Results for on-chip bus Architectures

Redundant Architectures Compared





**SoC***e* 

Introduction and Motivation Experimental Acquisition of λ (MTBF) Hardware for Reliable SDR Conclusions

SEU Test Set-up Analysis Results for Cipher Blocks Analysis Results for on-chip bus Architectures

# Analysis Results for on-chip bus Architectures

Redundant Architectures Compared



(c) Coarse-grained TMR



(d) Medium-grained TMR



Introduction and Motivation Experimental Acquisition of  $\lambda$  (MTBF) Hardware for Reliable SDR Conclusions

SEU Test Set-up Analysis Results for Cipher Blocks Analysis Results for on-chip bus Architectures

### Analysis Results for on-chip bus Architectures Analysis Results

### Table: Test Results for External and Internal voting

| Error type Testrups |          | Coarse-grained |      | Medium-grained |      |
|---------------------|----------|----------------|------|----------------|------|
|                     | Testiuns | Errors         | Per- | Errors         | Per- |
|                     |          |                | cent |                | cent |
| SEU                 | 17500    | 0              | 0    | 36             | 0.21 |
| 2bit-MBU            | 12500    | 58             | 0.46 | 51             | 0.41 |
| 2bit-MBU*           | 12500    | 36             | 0.29 | 67             | 0.54 |
| 3bit-MBU            | 12500    | 185            | 1.48 | 67             | 0.54 |
| 4bit-MBU            | 12500    | 337            | 2.70 | 103            | 0.82 |
| 5bit-MBU            | 12500    | 558            | 4.46 | 140            | 1.12 |



Introduction and Motivation Experimental Acquisition of A (MTBF) Hardware for Reliable SDR Conclusions

SEU Test Set-up Analysis Results for Cipher Blocks Analysis Results for on-chip bus Architectures

### Analysis Results for on-chip bus Architectures Analysis Results

### Table: Categories of test results

| Cat. | Description                                        | System<br>behavior |
|------|----------------------------------------------------|--------------------|
| Α    | All TMR layer counters are zero                    |                    |
| В    | One TMR layer counter differs from zero            | Correct            |
| C    | Two or three of the TMR layer counters differ from |                    |
| C    | zero                                               |                    |
| П    | Two or three of the TMR layer counters differ from |                    |
|      | zero                                               | Erroneous          |
| Е    | Two or three of the TMR layer counters are zero    | CAPERT SOC         |

SEU Test Set-up Analysis Results for Cipher Blocks Analysis Results for on-chip bus Architectures

### Analysis Results for on-chip bus Architectures Analysis Results



### Introduction and Motivation

- Reliability in Electronic Devices (hardware)
- FPGAs nowadays
- COTS FPGAs on critical applications
- FT and EM techniques in FPGAs

### 2 Experimental Acquisition of $\lambda$ (MTBF)

- SEU Test Set-up
- Analysis Results for Cipher Blocks
- Analysis Results for on-chip bus Architectures



Partial Reconfiguration and TMR on FPGA based Rugged Systems

**Safe***VPX* has 3U Eurocard size and it can run in stand-alone mode or it can be plugged in a 3U VPX (VITA 46, air-cooled) back-plane. In this last case, **Safe***VPX* supports 3 fat-pipes. Each fat-pipe is composed of four bidirectional Multi-Gigabit-Transceiver (MGT) links. Each fat-pipe channel allows the implementation of PCIe x4, Aurora or other protocol among high-speed communication standards.





#### Key features

### Features

- Virtex-5 FPGA (5VFX100-TFF1136) industrial range
- Hard PowerPC processor FPGA embedded
- CRC protected DDR2 memory (5 Gb)
- 2 isolated CAN communication channels
- 1 isolated serial communications link
- 3 isolated sigma-delta ADCs
- 1 SFP connector (Gigabit Ethernet or Fibre Optics)
- Ready for stand-alone or VPX racked operation
- 3U Eurocard factor form (3U VPX VITA 46)

- FMC connector (only available for stand-alone (operation mode)
- Multiple options for full and partial bitstreams storage
- Extended temperature range

#### FPGA ready for Xilinx X-TMR implementation:

- Triplicated clock inputs
- Triplicated CAN communication lines
- Triplicated serial communication lines
- Triplicated General Purpose I/Os



#### Combination with SDR boards







#### RuggedSDR Boards(VPX

or stand-alone) Last generation Software-Defined-Radio boards for air or conduction cooled systems



#### Air-cool or Conduction-Cool

#### SDRlab Chassis

3U VPX Laboratory Equipment with Power Supply and easily removable cover



Light VPX Chassis 3U Conduction-cooled chassis and clamshells





Introduction and Motivation Experimental Acquisition of  $\lambda$  (MTBF) Hardware for Reliable SDR Conclusions

#### Conclusions

- Reliability in electronics systems is a challenge with nowadays levels of integration and complexity
- Modern FPGAs and SoPCs are the key circuits in state-of-the-art boards
- Error mitigation and Fault Tolerance traditional techniques are applicable with FPGAs
- Experimental analysis techniques combined with the use of standards defined for rugged systems can ensure high levels of reliability on SDR Platforms





Research Projects:

- This work has been partially supported by the Ministerio de Ciencia e Innovación of Spain within the project TEC2011-28250-C02-01.
- This work has been carried out inside de Research and Education Unit UFI11/16 of the UPV/EHU and supported by the Department of Education, Universities and Research of the Basque Government within the fund for research groups of the Basque university system IT394-10.



### Thank you for your attention!





System-on-Chip engineering

www.soc-e.com





C. Constantinescu, "Trends and challenges in VLSI circuit reliability," *IEEE Micro*, vol. 23, no. 4, pp. 14–19, 2003.



Xilinx Corp., "Device reliability report, fourth quarter 2010," Xilinx Documentation, http://www.xilinx.com, Feb. 2011.



D. Petrick, T. Flatley, A. Geist, D. Espinosa, G. Crum, M. Lin, J. Hosler, M. Buenfil, and K. Blank, "SpaceCube: Current Missions and Ongoing Platform Advancements," in *Military and Aerospace Programmable Logic Devices (MAPLD)*, 2009.



A. Morillo, A. Astarloa, A. Zuloaga, U. Bidarte, and J. Lázaro, "Técnicas de diseño seguro con FPGAs," in XVII Seminario Anual de Automática, Electrónica Industrial e Instrumentación (SAAEI10), vol. ISBN 978-84-95809-75-9, 2010.



K. Chapman and L. Jones, "SEU strategies for Virtex-5 devices," XAPP864, 2009, Xilinx Inc.



G. Miller, C. Carmichael, and G. Swift, "Single-Event Upset Migration for Xilinx FPGA Block Memories," Xilinx Application Notes, http://www.xilinx.com/support/documentation/application\_notes/xapp962.pdf, Mar. 2008.



U. Kretzschmar, A. Astarloa, and J. Lázaro, "Seu resilience of des, aes and twofish in sram-based fpga," in *Reconfigurable Computing: Architectures, Tools and Applications.* Springer, 2013, pp. 37–46.



Tetraedre Sarl, Auvernier, "TCDG - DES cryptographic module," http://www.tetraedre.com/advanced/index.php, sep. 2010.



H.Satyanarayana, "AES128 - Project: aes\_crypto\_core," OPENCORES http://opencores.org, dec. 2004.



Opencores.org, "Twofish - Project: Twofish Core," OPENCORES http://opencores.org, feb. 2002.



U. Kretzschmar, A. Astarloa, J. Lazaro, M. Garay, and J. Del Ser, "Robustness of different tmr granularities in shared wishbone architectures on sram fpga," in *Reconfigurable Computing and FPGAs (ReConFig)*, 2012 International Conference on, 2012, pp. 1–6.

