001     623229
005     20250810054026.0
024 7 _ |a 10.1002/pmic.202400100
|2 doi
024 7 _ |a 1615-9853
|2 ISSN
024 7 _ |a 1615-9861
|2 ISSN
024 7 _ |a 10.3204/PUBDB-2025-00661
|2 datacite_doi
024 7 _ |a altmetric:172607406
|2 altmetric
024 7 _ |a pmid:39740174
|2 pmid
037 _ _ |a PUBDB-2025-00661
041 _ _ |a English
082 _ _ |a 540
100 1 _ |a Schumann, Yannis
|0 P:(DE-H253)PIP1086181
|b 0
|e Corresponding author
|u desy
245 _ _ |a Computational Methods for Data Integration and Imputation of Missing Values in Omics Datasets
260 _ _ |a Weinheim
|c 2025
|b Wiley VCH
336 7 _ |a article
|2 DRIVER
336 7 _ |a Output Types/Journal article
|2 DataCite
336 7 _ |a Journal Article
|b journal
|m journal
|0 PUB:(DE-HGF)16
|s 1754640873_2558065
|2 PUB:(DE-HGF)
336 7 _ |a ARTICLE
|2 BibTeX
336 7 _ |a JOURNAL_ARTICLE
|2 ORCID
336 7 _ |a Journal Article
|0 0
|2 EndNote
500 _ _ |a J.E.N is funded by the DFG (Emmy Noether program).
520 _ _ |a Molecular profiling of different omic-modalities (e.g., DNA methylomics, transcriptomics, proteomics) in biological systems represents the basis for research and clinical decision-making. Measurement-specific biases, so-called batch effects, often hinder the integration of independently acquired datasets, and missing values further hamper the applicability of typical data processing algorithms. In addition to careful experimental design, well-defined standards in data acquisition and data exchange, the alleviation of these phenomena particularly requires a dedicated data integration and preprocessing pipeline. This review aims to give a comprehensive overview of computational methods for data integration and missing value imputation for omic data analyses.We provide formal definitions for missing value mechanisms and propose a novel statistical taxonomy for batch effects, especially in the presence of missing data. Based on an automated document search and systematic literature review, we describe 32 distinct data integration methods from five main methodological categories, as well as 37 algorithms for missing value imputation from five separate categories. Additionally, this review highlights multiple quantitative evaluation methods to aid researchers in selecting a suitable set of methods for their work. Finally, this work provides an integrated discussion of the relevance of batch effects and missing values in omics with corresponding method recommendations. We then propose a comprehensive three-step workflow from the study conception to final data analysis and deduce perspectives for future research. Eventually, we present a comprehensive flow chart as well as exemplary decision trees to aid practitioners in the selection of specific approaches for imputation and data integration in their studies.
536 _ _ |a 623 - Data Management and Analysis (POF4-623)
|0 G:(DE-HGF)POF4-623
|c POF4-623
|f POF IV
|x 0
588 _ _ |a Dataset connected to CrossRef, Journals: bib-pubdb1.desy.de
693 _ _ |0 EXP:(DE-MLZ)NOSPEC-20140101
|5 EXP:(DE-MLZ)NOSPEC-20140101
|e No specific instrument
|x 0
700 1 _ |a Gocke, Antonia
|b 1
700 1 _ |a Neumann, Julia E.
|0 P:(DE-HGF)0
|b 2
|e Corresponding author
773 _ _ |a 10.1002/pmic.202400100
|g Vol. 25, no. 1-2, p. e202400100
|0 PERI:(DE-600)2037674-1
|n 1-2
|p e202400100
|t Proteomics
|v 25
|y 2025
|x 1615-9853
856 4 _ |y OpenAccess
|u https://bib-pubdb1.desy.de/record/623229/files/Proteomics%20-%202025%20-%20Schumann%20-%20Computational%20Methods%20for%20Data%20Integration%20and%20Imputation%20of%20Missing%20Values%20in%20Omics.pdf
856 4 _ |y OpenAccess
|x pdfa
|u https://bib-pubdb1.desy.de/record/623229/files/Proteomics%20-%202025%20-%20Schumann%20-%20Computational%20Methods%20for%20Data%20Integration%20and%20Imputation%20of%20Missing%20Values%20in%20Omics.pdf?subformat=pdfa
909 C O |o oai:bib-pubdb1.desy.de:623229
|p openaire
|p open_access
|p VDB
|p driver
|p dnbdelivery
910 1 _ |a Deutsches Elektronen-Synchrotron
|0 I:(DE-588b)2008985-5
|k DESY
|b 0
|6 P:(DE-H253)PIP1086181
913 1 _ |a DE-HGF
|b Forschungsbereich Materie
|l Materie und Technologie
|1 G:(DE-HGF)POF4-620
|0 G:(DE-HGF)POF4-623
|3 G:(DE-HGF)POF4
|2 G:(DE-HGF)POF4-600
|4 G:(DE-HGF)POF
|v Data Management and Analysis
|x 0
914 1 _ |y 2025
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0200
|2 StatID
|b SCOPUS
|d 2025-01-02
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0300
|2 StatID
|b Medline
|d 2025-01-02
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)1050
|2 StatID
|b BIOSIS Previews
|d 2025-01-02
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)1190
|2 StatID
|b Biological Abstracts
|d 2025-01-02
915 _ _ |a JCR
|0 StatID:(DE-HGF)0100
|2 StatID
|b PROTEOMICS : 2022
|d 2025-01-02
915 _ _ |a Creative Commons Attribution-NonCommercial CC BY-NC 4.0
|0 LIC:(DE-HGF)CCBYNC4
|2 HGFVOC
915 _ _ |a DEAL Wiley
|0 StatID:(DE-HGF)3001
|2 StatID
|d 2025-01-02
|w ger
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)1030
|2 StatID
|b Current Contents - Life Sciences
|d 2025-01-02
915 _ _ |a WoS
|0 StatID:(DE-HGF)0113
|2 StatID
|b Science Citation Index Expanded
|d 2025-01-02
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0150
|2 StatID
|b Web of Science Core Collection
|d 2025-01-02
915 _ _ |a IF < 5
|0 StatID:(DE-HGF)9900
|2 StatID
|d 2025-01-02
915 _ _ |a OpenAccess
|0 StatID:(DE-HGF)0510
|2 StatID
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0160
|2 StatID
|b Essential Science Indicators
|d 2025-01-02
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0199
|2 StatID
|b Clarivate Analytics Master Journal List
|d 2025-01-02
920 1 _ |0 I:(DE-H253)IT-20120731
|k IT
|l Informationstechnologie
|x 0
980 _ _ |a journal
980 _ _ |a VDB
980 _ _ |a UNRESTRICTED
980 _ _ |a I:(DE-H253)IT-20120731
980 1 _ |a FullTexts


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21