| Home > Publications database > Coarse-Graining and Classifying Massive High-Throughput XFEL Datasets of Crystallization in Supercooled Water |
| Journal Article | PUBDB-2025-04594 |
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
2025
MDPI
Basel
This record in other databases:
Please use a persistent id in citations: doi:10.3390/cryst15080734 doi:10.3204/PUBDB-2025-04594
Abstract: Ice crystallization in supercooled water is a complex phenomenon with far-reaching implications across scientific disciplines, including cloud formation physics and cryopreservation. Experimentally studying such complexity can be a highly data-driven and data-hungry endeavor because of the need to record rare events that cannot be triggered on demand. Here, we describe such an experiment comprising 561 million images of X-ray free-electron laser (XFEL) diffraction patterns (2.3 PB raw data) spanning the disorder-to-order transition in micrometer-sized supercooled water droplets. To effectively analyze these patterns, we propose a data reduction (i.e., coarse-graining) and dimensionality reduction (i.e., principal component analysis) strategy. We show that a simple set of criteria on this reduced dataset can efficiently classify these patterns in the absence of reference diffraction signatures, which we validated using more precise but computationally expensive unsupervised machine learning techniques. For hit-finding, our strategy attained 98% agreement with our cross-validation. We speculate that these strategies may be generalized to other types of large high-dimensional datasets generated at high-throughput XFEL facilities.
|
The record appears in these collections: |