001     619753
005     20250407222511.0
024 7 _ |a 10.15480/882.9689
|2 doi
024 7 _ |a 10.3204/PUBDB-2024-07888
|2 datacite_doi
037 _ _ |a PUBDB-2024-07888
041 _ _ |a English
100 1 _ |a Kotobi, Amir
|0 P:(DE-H253)PIP1092133
|b 0
|e Corresponding author
|g male
245 _ _ |a Dynamic structure investigation and spectra prediction of biomolecules using machine learning techniques
|f 2020-02-01 - 2024-06-06
260 _ _ |c 2024
|b TUHH Universitätsbibliothek
300 _ _ |a 141
336 7 _ |a Output Types/Dissertation
|2 DataCite
336 7 _ |a DISSERTATION
|2 ORCID
336 7 _ |a PHDTHESIS
|2 BibTeX
336 7 _ |a Thesis
|0 2
|2 EndNote
336 7 _ |a Dissertation / PhD Thesis
|b phd
|m phd
|0 PUB:(DE-HGF)11
|s 1744012115_1668329
|2 PUB:(DE-HGF)
336 7 _ |a doctoralThesis
|2 DRIVER
502 _ _ |a Dissertation, Technische Universität Hamburg, 2024
|c Technische Universität Hamburg
|b Dissertation
|d 2024
520 _ _ |a The investigation of biomolecular structures and the prediction of their spectra using experimental and theoretical studies in the gas phase represent fundamental steps in comprehending their intrinsic properties and biological functions. Nonetheless, the complexity of the potential energy surface of biomolecules, combined with limitations in computational resources, limits the interpretation of experimental observations. Integrating supervised and unsupervised machine learning (ML) techniques into theoretical calculations is considered as an effective way to address these challenges. Infrared (IR) and X-ray absorption spectroscopy (XAS) has proven to be powerful experimental techniques to study the electronic and spatial structure of biomolecules such as peptides and proteins. Reproducing and validating the features observed in spectra resulting from these experiments often requires the use of sophisticated ab initio calculations and comprehensive understanding of biomolecules’ configurational space. In this thesis, I introduced a novel approach in interpretation of IR experimental spectrum of a peptide which aims enhancing the exploratory power of searching configurational space by combining REMD simulations, unsupervised machine learning, and ab initio calculations. This scheme relies on a set of structural descriptors and data-driven clustering technique which accounts for canonical ensemble of real experimental condition to obtain an accurate computed spectrum. We show that by partitioning the configurational space into subensembles of imilar conformations i.e. clusters, an accurate IR spectrum can be calculated by averaging the IR contribution of each representative conformer in each cluster, weighted according to the population of each cluster. While this approach unravels important fingerprints of experimental spectroscopic data, the calculation of IR and particularly XAS spectra, due to its inherently expensive theoretical computation, is often computationally prohibitive task for even medium-sized molecules. To remedy the computational obstacles associated with spectra prediction, we develope a data-driven supervised ML frameworks, i.e. graph neural networks which are trained on a custom-generated XAS dataset to find a mapping between structures and spectroscopic signals, thus bypassing the need for expensive ab initio quantum chemistry calculations. To insure the interpretability of GNN models’ predictions, we employ feature attribution to determine the respective contributions of various atoms in the molecules to the peaks observed in the XAS spectrum. Within this approach, we show that it is possible to link the peaks observed in the spectra to certain core and virtual orbitals from the quantum chemical calculations and obtain an in-depth understanding of the ML predicted XAS spectrum. The results presented in this thesis show that the integration of supervised and unsupervised ML techniques can effectively enhance the interpretation of spectroscopic data and make efficient use of the expensive ab initio calculations.Die Infrarot- und Röntgenabsorptionsspektroskopie haben sich als leistungsfähige experimentelle Instrumente zur die elektronischen und strukturellen Feinheiten von Biomolekülen, insbesondere Peptiden und Proteinen, aufzuklären. Parallel dazu haben die bemerkenswerten Fortschritte bei den Rechenkapazitäten die Fähigkeit beschleunigt die Fähigkeit, Chemie, Physik und maschinelles Lernen in einer echten Symbiose zu kombinieren, wodurch die präzise Modellierung und Verständnis komplexer biomolekularer Prozesse auf atomarer Ebene und die Validierung von experimentell beobachteten Spektralmerkmalen. Doch die inhärente Komplexität von Peptiden und Proteinen, gekoppelt mit den Rechenanforderungen quantenmechanischer Methoden für große Systeme stellen jedoch eine große Herausforderung dar, wenn es darum geht, die inhärenten Eigenschaften dieser Biomolekülen. Um diese Herausforderungen zu bewältigen, ist die Einbeziehung von überwachten und unüberwachten Techniken des maschinellen Lernens in die Molekulardynamik-Simulations-Toolbox erleichtert die das komplexe Zusammenspiel interatomarer und intermolekularer Wechselwirkungen zu entschlüsseln und den Weg für die den Weg für die Vorhersage verschiedener Eigenschaften dieser Systeme. Diese Dissertation befasst sich mit Feature und Techniken des unüberwachten maschinellen Lernens (z. B. Clustering und Dimensionality-Reduction), die auf atomistische Datensätze angewandt werden, um zu untersuchen, wie diese Techniken die komplexe Strukturlandschaft eines Modellpeptids beleuchten können. Darüber hinaus werden in dieser Arbeit Graph neuronale Netze als leistungsstarker und effizienter Ansatz zur Entschlüsselung der komplizierten.
536 _ _ |a 633 - Life Sciences – Building Blocks of Life: Structure and Function (POF4-633)
|0 G:(DE-HGF)POF4-633
|c POF4-633
|f POF IV
|x 0
536 _ _ |a HIDSS-0002 - DASHH: Data Science in Hamburg - Helmholtz Graduate School for the Structure of Matter (2019_IVF-HIDSS-0002)
|0 G:(DE-HGF)2019_IVF-HIDSS-0002
|c 2019_IVF-HIDSS-0002
|x 1
536 _ _ |a PHGS, VH-GS-500 - PIER Helmholtz Graduate School (2015_IFV-VH-GS-500)
|0 G:(DE-HGF)2015_IFV-VH-GS-500
|c 2015_IFV-VH-GS-500
|x 2
588 _ _ |a Dataset connected to DataCite
650 _ 7 |a Machine learning
|2 Other
650 _ 7 |a Infrared (IR)
|2 Other
650 _ 7 |a X-ray absorption spectroscopy (XAS)
|2 Other
650 _ 7 |a Graph neural networks (GNN)
|2 Other
650 _ 7 |a Explainability AI
|2 Other
650 _ 7 |a Natural Sciences and Mathematics::540: Chemistry
|2 Other
650 _ 7 |a Natural Sciences and Mathematics::570: Life Sciences, Biology
|2 Other
650 _ 7 |a Natural Sciences and Mathematics::510: Mathematics
|2 Other
693 _ _ |0 EXP:(DE-MLZ)NOSPEC-20140101
|5 EXP:(DE-MLZ)NOSPEC-20140101
|e No specific instrument
|x 0
693 _ _ |0 EXP:(DE-MLZ)External-20140101
|5 EXP:(DE-MLZ)External-20140101
|e Measurement at external facility
|x 1
700 1 _ |a Meissner, Robert
|0 P:(DE-H253)PIP1093118
|b 1
|e Thesis advisor
700 1 _ |a Bari, Sadia
|0 P:(DE-H253)PIP1014119
|b 2
|e Thesis advisor
700 1 _ |a Huber, Patrick
|0 P:(DE-HGF)0
|b 3
|e Thesis advisor
773 _ _ |a 10.15480/882.9689
856 4 _ |u https://hdl.handle.net/11420/47867
856 4 _ |u https://bib-pubdb1.desy.de/record/619753/files/zsWP40Kv.pdf.part
|y OpenAccess
909 C O |o oai:bib-pubdb1.desy.de:619753
|p openaire
|p open_access
|p VDB
|p driver
|p dnbdelivery
910 1 _ |a Deutsches Elektronen-Synchrotron
|0 I:(DE-588b)2008985-5
|k DESY
|b 0
|6 P:(DE-H253)PIP1092133
910 1 _ |a External Institute
|0 I:(DE-HGF)0
|k Extern
|b 0
|6 P:(DE-H253)PIP1092133
910 1 _ |a External Institute
|0 I:(DE-HGF)0
|k Extern
|b 1
|6 P:(DE-H253)PIP1093118
910 1 _ |a Deutsches Elektronen-Synchrotron
|0 I:(DE-588b)2008985-5
|k DESY
|b 2
|6 P:(DE-H253)PIP1014119
913 1 _ |a DE-HGF
|b Forschungsbereich Materie
|l Von Materie zu Materialien und Leben
|1 G:(DE-HGF)POF4-630
|0 G:(DE-HGF)POF4-633
|3 G:(DE-HGF)POF4
|2 G:(DE-HGF)POF4-600
|4 G:(DE-HGF)POF
|v Life Sciences – Building Blocks of Life: Structure and Function
|x 0
914 1 _ |y 2024
915 _ _ |a OpenAccess
|0 StatID:(DE-HGF)0510
|2 StatID
915 _ _ |a Creative Commons Attribution CC BY 4.0
|0 LIC:(DE-HGF)CCBY4
|2 HGFVOC
920 _ _ |l yes
920 1 _ |0 I:(DE-H253)FS-BIG-20220318
|k FS-BIG
|l Biomoleküle in Gasphase
|x 0
980 _ _ |a phd
980 _ _ |a VDB
980 _ _ |a I:(DE-H253)FS-BIG-20220318
980 _ _ |a UNRESTRICTED
980 1 _ |a FullTexts


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21