TY - JOUR
AU - Chia, Ervin S. H.
AU - Berberich, Tim B.
AU - Sobolev, Egor
AU - Koliyadu, Jayanath C. P.
AU - Adams, Patrick
AU - André, Tomas
AU - Antonia, Fabio Dall
AU - Cardoch, Sebastian
AU - De Santis, Emiliano
AU - Formosa, Andrew
AU - Hammarstroem, Bjoern
AU - Hassett, Michael P.
AU - Kim, Seonmyeong
AU - Kloos, Marco
AU - Letrun, Romain
AU - Malka, Janusz
AU - Monrroy Vilan e Melo, Diogo Filipe
AU - Paporakis, Stefan
AU - Sato, Tokushi
AU - Schmidt, Philipp
AU - Turkot, Oleksii
AU - Vakili, Mohammad
AU - Valerio, Joana
AU - Yenupuri, Tej Varma
AU - You, Tong
AU - de Wijn, Raphaël
AU - Park, Gun-Sik
AU - Abbey, Brian
AU - Darmanin, Connie
AU - Bajt, Saša
AU - Chapman, Henry N.
AU - Bielecki, Johan
AU - Maia, Filipe R. N. C.
AU - Timneanu, Nicusor
AU - Caleman, Carl
AU - Martin, Andrew V.
AU - Kurta, Ruslan P.
AU - Sellberg, Jonas A.
AU - Loh, Ne-te Duane
TI - Coarse-Graining and Classifying Massive High-Throughput XFEL Datasets of Crystallization in Supercooled Water
JO - Crystals
VL - 15
IS - 8
SN - 2073-4352
CY - Basel
PB - MDPI
M1 - PUBDB-2025-04594
SP - 734
PY - 2025
AB - Ice crystallization in supercooled water is a complex phenomenon with far-reaching implications across scientific disciplines, including cloud formation physics and cryopreservation. Experimentally studying such complexity can be a highly data-driven and data-hungry endeavor because of the need to record rare events that cannot be triggered on demand. Here, we describe such an experiment comprising 561 million images of X-ray free-electron laser (XFEL) diffraction patterns (2.3 PB raw data) spanning the disorder-to-order transition in micrometer-sized supercooled water droplets. To effectively analyze these patterns, we propose a data reduction (i.e., coarse-graining) and dimensionality reduction (i.e., principal component analysis) strategy. We show that a simple set of criteria on this reduced dataset can efficiently classify these patterns in the absence of reference diffraction signatures, which we validated using more precise but computationally expensive unsupervised machine learning techniques. For hit-finding, our strategy attained 98
LB - PUB:(DE-HGF)16
DO - DOI:10.3390/cryst15080734
UR - https://bib-pubdb1.desy.de/record/639637
ER -