SEUs Tolerance in FPGAs Based Digital LLRF System for XFEL
Mariusz Grecki

Abstract—The rapidly developing semiconductor technology allows to implement sophisticated digital control in the programmable devices platforms (FPGAs, CPUs). However the increasing size and performance of the circuits has also a drawback at the failure sensitivity, in particular for soft errors due to ionizing radiation. The sensitivity to SEUs is related to the critical charge which strongly depends on the transistor dimensions and supplying voltage. The sensitivity to ionizing radiation increases faster than the circuits complexity due to Moore’s law. Therefore the life critical systems and systems operating in radioactive environment have to deal with soft errors. The countermeasure can be special design techniques introducing the redundancy to the algorithms and/or circuit design allowing to detect and correct errors. The goal is to find the compromise between cost, performance and reliability. The development of such algorithms and systems must be supported by the test stand where the resistance to radiation influence can be evaluated.

I. XFEL

European X-Ray Free-Electron Laser (XFEL) will be driven by 1.6 km long linac consisting of 116 superconducting accelerator modules supplied by 29 RF stations [1]. The accelerator will provide electron beam with energy of 20GeV and average beam power 600kW. The energy spread must be better than 1MeV that means 0.005% regulation precision. The RF field regulation will be provided by sophisticated digital LLRF system. Due to dark currents (from possible field emission in the accelerator cavities) there will be a background radiation (gammas and neutrons) in the XFEL linac tunnel. It is expected that dark current of several microamperes can reach an energy of about 100 MeV before being dumped in the beam-line. Since the LLRF electronics will be installed at the same tunnel (Fig. 1) together with beam pipe the generated radiation is expected to influence the electronic devices.

Fig. 1. XFEL tunnel cross-section. The place for electronic racks is indicated by yellow color in the right picture.

M.Grecki is with Deutsches Elektronen-Synchrotron DESY, Hamburg, Germany (telephone: +49408998, e-mail: mariusz.grecki@desy.de).

The ionizing radiation has a negative influence on electronic devices. It reduces the lifetime of the circuit and also can cause the temporary malfunction during circuit operation [3], [4], [5].

The permanent effects are caused by excessive doses of radiation. Gamma photons cause charge accumulation in MOS gates that shifts the threshold voltage of the MOS transistor. The temporary effects (Single Event Effects - SEE) are caused by charged particles energetic enough to ionize the semiconductor by generating excessive electron/hole pairs [3]. The typical nuclear reaction of neutron and nucleus of boron \( ^{10}\text{B} \) (about 20% of boron commonly used as silicon dopant) creates \( \alpha \) particle energetic enough to ionize the semiconductor on the way through (the typical range of \( \alpha \) particles in silicon is several microns). Those ionized areas of semiconductor can conduct electric current until the excessive electric carriers recombine and thus random parts of the electronic circuits can be exposed to current pulses disturbing normal operation. Most of the SEEs are temporary but in some cases SEE can cause permanent (but reversible after power turn-off/turn-on cycle) Single Event Latchup (SEL) or even non-reversible Single Event Burn-out.

The Single Event Transients (SET) caused by temporary ionization of the semiconductor resulting in current pulses can propagate through the circuit logic and generate faulty signals. These disturbances are only temporary and after a short period (usually below a nanosecond) the SET vanishes. Different situation occurs when the SET is locked in a data storage component (flip-flop, register or memory). In such a case the SET is no longer temporary, but permanently changes information stored in the affected component (Single Event Upset) [4], however restoring information to the previous state recovers normal operation of the circuit. The probability of SEU in an electronic circuit is not equal to zero even in typical operating conditions (sea level, no radiation sources in environment) but rise up rapidly in special environments (space applications, air planes, high energy physics experiments, etc.). The probability rise up when the semiconductor technology improves since lower feature size and lower supply voltage allow generating SEU by less energized particles.

II. LLRF SYSTEM FOR XFEL

The single RF station of XFEL will consist of 4 accelerating cryomodules (each composed of 8 superconducting cavities) and a klystron supplying the RF power driven by the LLRF system ((Fig. 2) ). The XFEL LLRF is a closed loop digital control [2].

The state of RF electromagnetic field filling the accelerating cavities is measured by sensors determining electric field,
forward power and reflected power signals (Vacc, Ainc, Aref). All of them are RF signals and their digital processing requires downconversion to intermediate frequency signals, preserving information about the amplitude and phase of RF field. The samples of accelerating voltages (Vacc) from all cavities driven by the same RF power station (klystron) are used to calculate the vector sum. The current value of vector sum is compared to the setpoint and the error signal is feeding a regulator. Regulator outputs (I and Q) drive the vector modulator through DACs, providing the required input signal to the klystron which drives the cavities. The control system has to keep stable amplitude and phase of RF field in cavities in spite of beam influence, noises and drifts in the system and other disturbances.

### III. SEU Tolerant Circuits

One of the method to provide the SEU tolerance is to introduce the redundancy into the circuit [5], [6]. The Triple Module Redundancy (TMR) is commonly used for that purpose. However the multiplication of the whole circuit 3 times is usually to costly. Therefore only the most important parts of the circuit should be secured that way.

![Fig. 3. TMR applied to the most sensitive part of the circuit.](image)

Through application of TMR the reliability of the system increases since errors generated in the multiplied part can be detected and corrected. Unfortunately also the cost of the circuit increases. The achieved reliability gain $G_R$ (1) and cost finally depends on the surface of the circuit with applied TMR (B block in fig 3) and the overhead (surface occupied by voter and supporting circuits). Assuming that SEU probability is proportional to the surface occupied by the circuit ($pS$)

$$ G_R = \frac{p_B}{p_D} = \frac{S_B}{S_D} $$

where: $p_B, p_D$ - probabilities of SEU in the B block and in the voter (block D) respectively; $S_B, S_D$ - surface occupied by the B block of the circuit and the voter respectively

Since the surface occupancy increase can be estimated by (2) one can finally approximate the factor of cost increase by equation (3).

$$ \Delta S = 2S_B + S_D $$

$$ \delta S = \frac{S_A + S_C + 3S_B + S_D}{S_A + S_C + S_B} = \begin{cases} \delta S > 1 \\ \delta S < 3 \end{cases} $$

where: $\delta S$ - relative cost rise-up due to surface occupancy increase; $S_A, S_C$ - surface occupied by the A and C blocks of the circuit respectively

The optimization relies on achieving the highest possible reliability gain $G_R$ while keeping the cost increase relatively low. This is not a trivial task and requires confirmation by experiments. For that purpose a test stand has been designed.

### IV. Radiation Tolerance Test-Stand

In order to verify the developed SEU tolerant algorithms and systems the specialized test stand has been developed. It uses the SimconDSP board (Fig. 4). The same board that is used by existing LLRF system for FLASH.

![Fig. 4. SimconDSP board.](image)

The board is supervised by remote computer able to download the FPGA bitstream, check the correctness of the FPGA configuration memory and check the results of algorithm
execution. It is connected to the supervising computer by 2 fast serial links (Fig. 5). One link is used to download the firmware to FPGA (remote JTAG) and the second one is a user link. The software developed for experiments include the server monitoring the test stand and being an interface to client applications. The clients communicate with server through UDP based protocol. In order to allow easy interfacing the UDP sockets library was developed for Scilab mathematical package.

![Fig. 5. The block diagram of the whole system with supervising computer and client applications.](image)

The test device uses the built-in memory and DACs to simulate signals of the real system. The signals are measured by built-in ADC and regulation algorithm is applied. The resulting output signal is verified to check any inconsistency with the expected behavior. All the data stored in memories are secured by Hamming codes, all the digital signals in the supervisor part of the FPGA are processed by redundant circuits. The designed test stand has been installed at FLASH tunnel (Fig. 6) in the proximity of the bunch compressor BC2 (Fig. 7) where the highest radiation has been determined.

![Fig. 6. The system installed at FLASH.](image)

![Fig. 7. Bunch compressor 2 where the test-stand has been installed.](image)

![Fig. 8. Residual radiation level at FLASH BC2 and the SEUs rate recorded at the system.](image)

V. RESULTS

The developed test stand has been used for experiments during the FLASH tests and user run in the period 1-5.2012. During the tests machine was running with variable settings (beam energy, bunch number, bunch charge, repetition rate). Over the test period the measured radiation level corresponds well to the total number of SEUs detected in the system (Fig 8). The developed algorithms (TMR based) proved their efficiency and allowed to detect and correct SEU generated errors.

VI. CONCLUSION

The developed test stand for radiation tolerant systems has survived few months in radioactive FLASH environment without damage. All the system components (FPGA, DSP, memories, ADCs and DACs, communication interfaces) are working without visible performance degradation. Not all of the possible tests have been already performed. The complex tests for estimation the quality of signal processing algorithms in the presence of radiation are under development.

ACKNOWLEDGMENT

The research leading to these results has received funding from the European Commission under the EuCARD FP7 Research Infrastructures grant agreement no. 227579.

REFERENCES