TY  - CONF
AU  - Galchenkova, Marina
AU  - Tolstikova, A.
AU  - Oberthuer, D.
AU  - Sprenger, J.
AU  - Brehm, Wolfgang
AU  - White, T. A.
AU  - Barty, Anton
AU  - Chapman, H. N.
AU  - Yefanov, Oleksandr
TI  - Compression and data reduction in serial crystallography
SN  - 2053-2733
M1  - PUBDB-2025-02235
PY  - 2023
AB  - Protein crystallography is one of the most successful methods for biological structure determination. This technique requires many diffraction snapshots to get 3D structural information of the studied protein. Even more patterns are needed for studying fast protein dynamics that can be achieved using serial crystallography (SX). Fortunately, new X-ray facilities such as 4th generation synchrotrons and Free Electron Lasers (FELs) combined with newly developed X-ray detectors opened a way to carry out these experiments at a rate of more than 1000 images per second. The drawback of this increase in acquisition rate is the volume of collected data - up to 2 PB of data per experiment could be easily obtained. Therefore, new data reduction strategies have to be developed and deployed. Lossless data reduction methods will not change the data, but usually fail to achieve a high compression ratio. On the other hand, lossy compression methods can significantly reduce the amount of data, but they require careful evaluation of the resulting data quality. We have tested different approaches for both lossless and lossy compression applied to SX data, proposed some new ways for lossy compression and demonstrated appropriate methods for data quality assessment. By checking the resulting statistics of compressed data (like CC*/Rsplit, Rfree/Rwork) we have demonstrated that the volume of the measured data can be greatly reduced (10-100 times!) while the quality of the resulting data was kept almost constant. In addition, we tested lossy compression methods on the SAD dataset (thaumatin collected at 4.57 keV, measured at the SwissFEL) and demonstrated that even such very sensitive data can be successfully compressed. It allowed us to determine the limit of application for all considered lossy compressions. Some of the proposed compression strategies, tested on SX and MX datasets, can be used for other types of experiments, even with different sources (for example electron and neutron diffraction).
T2  - Twenty-Sixth Congress and General Assembly of the International Union of Crystallography
CY  - 22 Aug 2023 - 29 Aug 2023, Melbourne (Australia)
Y2  - 22 Aug 2023 - 29 Aug 2023
M2  - Melbourne, Australia
LB  - PUB:(DE-HGF)1
DO  - DOI:10.1107/S2053273323095244
UR  - https://bib-pubdb1.desy.de/record/632810
ER  -