Abstract PUBDB-2025-02235

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Compression and data reduction in serial crystallography

 ;  ;  ;  ;  ;  ;  ;  ;

2023

Twenty-Sixth Congress and General Assembly of the International Union of Crystallography, IUCr 2023, MelbourneMelbourne, Australia, 22 Aug 2023 - 29 Aug 20232023-08-222023-08-29 () [10.1107/S2053273323095244]
 GO

This record in other databases:

Please use a persistent id in citations: doi:

Abstract: Protein crystallography is one of the most successful methods for biological structure determination. This technique requires many diffraction snapshots to get 3D structural information of the studied protein. Even more patterns are needed for studying fast protein dynamics that can be achieved using serial crystallography (SX). Fortunately, new X-ray facilities such as 4th generation synchrotrons and Free Electron Lasers (FELs) combined with newly developed X-ray detectors opened a way to carry out these experiments at a rate of more than 1000 images per second. The drawback of this increase in acquisition rate is the volume of collected data - up to 2 PB of data per experiment could be easily obtained. Therefore, new data reduction strategies have to be developed and deployed. Lossless data reduction methods will not change the data, but usually fail to achieve a high compression ratio. On the other hand, lossy compression methods can significantly reduce the amount of data, but they require careful evaluation of the resulting data quality. We have tested different approaches for both lossless and lossy compression applied to SX data, proposed some new ways for lossy compression and demonstrated appropriate methods for data quality assessment. By checking the resulting statistics of compressed data (like CC*/Rsplit, Rfree/Rwork) we have demonstrated that the volume of the measured data can be greatly reduced (10-100 times!) while the quality of the resulting data was kept almost constant. In addition, we tested lossy compression methods on the SAD dataset (thaumatin collected at 4.57 keV, measured at the SwissFEL) and demonstrated that even such very sensitive data can be successfully compressed. It allowed us to determine the limit of application for all considered lossy compressions. Some of the proposed compression strategies, tested on SX and MX datasets, can be used for other types of experiments, even with different sources (for example electron and neutron diffraction).

Classification:

Contributing Institute(s):
  1. FS-CFEL-1 Fachgruppe BMX (FS-CFEL-1-BMX)
  2. FS-CFEL-1 (Group Leader: Henry Chapman) (CFEL-I)
  3. Scientific computing (FS-SC)
Research Program(s):
  1. 633 - Life Sciences – Building Blocks of Life: Structure and Function (POF4-633) (POF4-633)
  2. AIM, DFG project G:(GEPRIS)390715994 - EXC 2056: CUI: Advanced Imaging of Matter (390715994) (390715994)
Experiment(s):
  1. PETRA Beamline P11 (PETRA III)

Database coverage:
Medline ; Clarivate Analytics Master Journal List ; Current Contents - Physical, Chemical and Earth Sciences ; DEAL Wiley ; Ebsco Academic Search ; Essential Science Indicators ; NationallizenzNationallizenz ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection
Click to display QR Code for this record

The record appears in these collections:
Private Collections > >CFEL > >FS-CFEL > FS-CFEL-1-BMX
Private Collections > >CFEL > >FS-CFEL > CFEL-I
Private Collections > >DESY > >FS > FS-SC
Document types > Presentations > Abstracts
Public records
Publications database

 Record created 2025-07-08, last modified 2025-07-25


Restricted:
Download fulltext PDF Download fulltext PDF (PDFA)
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)