Journal Article PUBDB-2025-00661

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Computational Methods for Data Integration and Imputation of Missing Values in Omics Datasets

 ;  ;

2025
Wiley VCH Weinheim

Proteomics 25(1-2), e202400100 () [10.1002/pmic.202400100]
 GO

This record in other databases:    

Please use a persistent id in citations: doi:  doi:

Abstract: Molecular profiling of different omic-modalities (e.g., DNA methylomics, transcriptomics, proteomics) in biological systems represents the basis for research and clinical decision-making. Measurement-specific biases, so-called batch effects, often hinder the integration of independently acquired datasets, and missing values further hamper the applicability of typical data processing algorithms. In addition to careful experimental design, well-defined standards in data acquisition and data exchange, the alleviation of these phenomena particularly requires a dedicated data integration and preprocessing pipeline. This review aims to give a comprehensive overview of computational methods for data integration and missing value imputation for omic data analyses.We provide formal definitions for missing value mechanisms and propose a novel statistical taxonomy for batch effects, especially in the presence of missing data. Based on an automated document search and systematic literature review, we describe 32 distinct data integration methods from five main methodological categories, as well as 37 algorithms for missing value imputation from five separate categories. Additionally, this review highlights multiple quantitative evaluation methods to aid researchers in selecting a suitable set of methods for their work. Finally, this work provides an integrated discussion of the relevance of batch effects and missing values in omics with corresponding method recommendations. We then propose a comprehensive three-step workflow from the study conception to final data analysis and deduce perspectives for future research. Eventually, we present a comprehensive flow chart as well as exemplary decision trees to aid practitioners in the selection of specific approaches for imputation and data integration in their studies.

Classification:

Note: J.E.N is funded by the DFG (Emmy Noether program).

Contributing Institute(s):
  1. Informationstechnologie (IT)
Research Program(s):
  1. 623 - Data Management and Analysis (POF4-623) (POF4-623)
Experiment(s):
  1. No specific instrument

Appears in the scientific report 2025
Database coverage:
Medline ; Creative Commons Attribution-NonCommercial CC BY-NC 4.0 ; OpenAccess ; BIOSIS Previews ; Biological Abstracts ; Clarivate Analytics Master Journal List ; Current Contents - Life Sciences ; DEAL Wiley ; Essential Science Indicators ; IF < 5 ; JCR ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection
Click to display QR Code for this record

The record appears in these collections:
Private Collections > >DESY > >FH > >IT > IT
Document types > Articles > Journal Article
Public records
Publications database
OpenAccess

 Record created 2025-02-06, last modified 2025-08-10


OpenAccess:
Download fulltext PDF Download fulltext PDF (PDFA)
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)