000619753 001__ 619753
000619753 005__ 20250407222511.0
000619753 0247_ $$2doi$$a10.15480/882.9689
000619753 0247_ $$2datacite_doi$$a10.3204/PUBDB-2024-07888
000619753 037__ $$aPUBDB-2024-07888
000619753 041__ $$aEnglish
000619753 1001_ $$0P:(DE-H253)PIP1092133$$aKotobi, Amir$$b0$$eCorresponding author$$gmale
000619753 245__ $$aDynamic structure investigation and spectra prediction of biomolecules using machine learning techniques$$f2020-02-01 - 2024-06-06
000619753 260__ $$bTUHH Universitätsbibliothek$$c2024
000619753 300__ $$a141
000619753 3367_ $$2DataCite$$aOutput Types/Dissertation
000619753 3367_ $$2ORCID$$aDISSERTATION
000619753 3367_ $$2BibTeX$$aPHDTHESIS
000619753 3367_ $$02$$2EndNote$$aThesis
000619753 3367_ $$0PUB:(DE-HGF)11$$2PUB:(DE-HGF)$$aDissertation / PhD Thesis$$bphd$$mphd$$s1744012115_1668329
000619753 3367_ $$2DRIVER$$adoctoralThesis
000619753 502__ $$aDissertation, Technische Universität Hamburg, 2024$$bDissertation$$cTechnische Universität Hamburg$$d2024
000619753 520__ $$aThe investigation of biomolecular structures and the prediction of their spectra using experimental and theoretical studies in the gas phase represent fundamental steps in comprehending their intrinsic properties and biological functions. Nonetheless, the complexity of the potential energy surface of biomolecules, combined with limitations in computational resources, limits the interpretation of experimental observations. Integrating supervised and unsupervised machine learning (ML) techniques into theoretical calculations is considered as an effective way to address these challenges. Infrared (IR) and X-ray absorption spectroscopy (XAS) has proven to be powerful experimental techniques to study the electronic and spatial structure of biomolecules such as peptides and proteins. Reproducing and validating the features observed in spectra resulting from these experiments often requires the use of sophisticated ab initio calculations and comprehensive understanding of biomolecules’ configurational space. In this thesis, I introduced a novel approach in interpretation of IR experimental spectrum of a peptide which aims enhancing the exploratory power of searching configurational space by combining REMD simulations, unsupervised machine learning, and ab initio calculations. This scheme relies on a set of structural descriptors and data-driven clustering technique which accounts for canonical ensemble of real experimental condition to obtain an accurate computed spectrum. We show that by partitioning the configurational space into subensembles of imilar conformations i.e. clusters, an accurate IR spectrum can be calculated by averaging the IR contribution of each representative conformer in each cluster, weighted according to the population of each cluster. While this approach unravels important fingerprints of experimental spectroscopic data, the calculation of IR and particularly XAS spectra, due to its inherently expensive theoretical computation, is often computationally prohibitive task for even medium-sized molecules. To remedy the computational obstacles associated with spectra prediction, we develope a data-driven supervised ML frameworks, i.e. graph neural networks which are trained on a custom-generated XAS dataset to find a mapping between structures and spectroscopic signals, thus bypassing the need for expensive ab initio quantum chemistry calculations. To insure the interpretability of GNN models’ predictions, we employ feature attribution to determine the respective contributions of various atoms in the molecules to the peaks observed in the XAS spectrum. Within this approach, we show that it is possible to link the peaks observed in the spectra to certain core and virtual orbitals from the quantum chemical calculations and obtain an in-depth understanding of the ML predicted XAS spectrum. The results presented in this thesis show that the integration of supervised and unsupervised ML techniques can effectively enhance the interpretation of spectroscopic data and make efficient use of the expensive ab initio calculations.Die Infrarot- und Röntgenabsorptionsspektroskopie haben sich als leistungsfähige experimentelle Instrumente zur die elektronischen und strukturellen Feinheiten von Biomolekülen, insbesondere Peptiden und Proteinen, aufzuklären. Parallel dazu haben die bemerkenswerten Fortschritte bei den Rechenkapazitäten die Fähigkeit beschleunigt die Fähigkeit, Chemie, Physik und maschinelles Lernen in einer echten Symbiose zu kombinieren, wodurch die präzise Modellierung und Verständnis komplexer biomolekularer Prozesse auf atomarer Ebene und die Validierung von experimentell beobachteten Spektralmerkmalen. Doch die inhärente Komplexität von Peptiden und Proteinen, gekoppelt mit den Rechenanforderungen quantenmechanischer Methoden für große Systeme stellen jedoch eine große Herausforderung dar, wenn es darum geht, die inhärenten Eigenschaften dieser Biomolekülen. Um diese Herausforderungen zu bewältigen, ist die Einbeziehung von überwachten und unüberwachten Techniken des maschinellen Lernens in die Molekulardynamik-Simulations-Toolbox erleichtert die das komplexe Zusammenspiel interatomarer und intermolekularer Wechselwirkungen zu entschlüsseln und den Weg für die den Weg für die Vorhersage verschiedener Eigenschaften dieser Systeme. Diese Dissertation befasst sich mit Feature und Techniken des unüberwachten maschinellen Lernens (z. B. Clustering und Dimensionality-Reduction), die auf atomistische Datensätze angewandt werden, um zu untersuchen, wie diese Techniken die komplexe Strukturlandschaft eines Modellpeptids beleuchten können. Darüber hinaus werden in dieser Arbeit Graph neuronale Netze als leistungsstarker und effizienter Ansatz zur Entschlüsselung der komplizierten.
000619753 536__ $$0G:(DE-HGF)POF4-633$$a633 - Life Sciences – Building Blocks of Life: Structure and Function (POF4-633)$$cPOF4-633$$fPOF IV$$x0
000619753 536__ $$0G:(DE-HGF)2019_IVF-HIDSS-0002$$aHIDSS-0002 - DASHH: Data Science in Hamburg - Helmholtz Graduate School for the Structure of Matter (2019_IVF-HIDSS-0002)$$c2019_IVF-HIDSS-0002$$x1
000619753 536__ $$0G:(DE-HGF)2015_IFV-VH-GS-500$$aPHGS, VH-GS-500 - PIER Helmholtz Graduate School (2015_IFV-VH-GS-500)$$c2015_IFV-VH-GS-500$$x2
000619753 588__ $$aDataset connected to DataCite
000619753 650_7 $$2Other$$aMachine learning
000619753 650_7 $$2Other$$aInfrared (IR)
000619753 650_7 $$2Other$$aX-ray absorption spectroscopy (XAS)
000619753 650_7 $$2Other$$aGraph neural networks (GNN)
000619753 650_7 $$2Other$$aExplainability AI
000619753 650_7 $$2Other$$aNatural Sciences and Mathematics::540: Chemistry
000619753 650_7 $$2Other$$aNatural Sciences and Mathematics::570: Life Sciences, Biology
000619753 650_7 $$2Other$$aNatural Sciences and Mathematics::510: Mathematics
000619753 693__ $$0EXP:(DE-MLZ)NOSPEC-20140101$$5EXP:(DE-MLZ)NOSPEC-20140101$$eNo specific instrument$$x0
000619753 693__ $$0EXP:(DE-MLZ)External-20140101$$5EXP:(DE-MLZ)External-20140101$$eMeasurement at external facility$$x1
000619753 7001_ $$0P:(DE-H253)PIP1093118$$aMeissner, Robert$$b1$$eThesis advisor
000619753 7001_ $$0P:(DE-H253)PIP1014119$$aBari, Sadia$$b2$$eThesis advisor
000619753 7001_ $$0P:(DE-HGF)0$$aHuber, Patrick$$b3$$eThesis advisor
000619753 773__ $$a10.15480/882.9689
000619753 8564_ $$uhttps://hdl.handle.net/11420/47867
000619753 8564_ $$uhttps://bib-pubdb1.desy.de/record/619753/files/zsWP40Kv.pdf.part$$yOpenAccess
000619753 909CO $$ooai:bib-pubdb1.desy.de:619753$$pdnbdelivery$$pdriver$$pVDB$$popen_access$$popenaire
000619753 9101_ $$0I:(DE-588b)2008985-5$$6P:(DE-H253)PIP1092133$$aDeutsches Elektronen-Synchrotron$$b0$$kDESY
000619753 9101_ $$0I:(DE-HGF)0$$6P:(DE-H253)PIP1092133$$aExternal Institute$$b0$$kExtern
000619753 9101_ $$0I:(DE-HGF)0$$6P:(DE-H253)PIP1093118$$aExternal Institute$$b1$$kExtern
000619753 9101_ $$0I:(DE-588b)2008985-5$$6P:(DE-H253)PIP1014119$$aDeutsches Elektronen-Synchrotron$$b2$$kDESY
000619753 9131_ $$0G:(DE-HGF)POF4-633$$1G:(DE-HGF)POF4-630$$2G:(DE-HGF)POF4-600$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$aDE-HGF$$bForschungsbereich Materie$$lVon Materie zu Materialien und Leben$$vLife Sciences – Building Blocks of Life: Structure and Function$$x0
000619753 9141_ $$y2024
000619753 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000619753 915__ $$0LIC:(DE-HGF)CCBY4$$2HGFVOC$$aCreative Commons Attribution CC BY 4.0
000619753 920__ $$lyes
000619753 9201_ $$0I:(DE-H253)FS-BIG-20220318$$kFS-BIG$$lBiomoleküle in Gasphase$$x0
000619753 980__ $$aphd
000619753 980__ $$aVDB
000619753 980__ $$aI:(DE-H253)FS-BIG-20220318
000619753 980__ $$aUNRESTRICTED
000619753 9801_ $$aFullTexts