High Performance Data Integration for Large-Scale Analyses of Incomplete Omic Profiles Using Batch-Effect Reduction Trees (BERT)

Neumann, Philipp; Neumann, Julia; Schumann, Yannis; Schlumbohm, Simon
doi:10.1038/s41467-025-62237-4
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@ARTICLE{Neumann:623116,
      author       = {Neumann, Philipp and Schumann, Yannis and Schlumbohm, Simon
                      and Neumann, Julia},
      title        = {{H}igh {P}erformance {D}ata {I}ntegration for
                      {L}arge-{S}cale {A}nalyses of {I}ncomplete {O}mic {P}rofiles
                      {U}sing {B}atch-{E}ffect {R}eduction {T}rees ({BERT})},
      journal      = {Nature Communications},
      volume       = {16},
      number       = {1},
      issn         = {2041-1723},
      address      = {[London]},
      publisher    = {Springer Nature},
      reportid     = {PUBDB-2025-00603},
      pages        = {7104},
      year         = {2025},
      abstract     = {Data from high-throughput technologies assessing global
                      patterns of biomolecules (omic data), is often afflicted
                      with missing values and with measurement-specific biases
                      (batch-effects), that hinder the quantitative comparison of
                      independently acquired datasets. This work introduces
                      batch-effect reduction trees (BERT), a high-performance
                      method for data integration of incomplete omic profiles.We
                      characterize BERT on large-scale data integration tasks with
                      up to 5000 datasets from simulated and experimental data of
                      different quantification techniques and omic types
                      (proteomics, transcriptomics, metabolomics) as well as other
                      datatypes e.g., clinical data, emphasizing the broad scope
                      of the algorithm. Compared to the only available method for
                      integration of incomplete omic data, HarmonizR, our method1)
                      retains up to five orders of magnitude more numeric
                      values,2) leverages multi-core and distributed-memory
                      systems for up to 11x runtime improvement3) considers
                      covariates and reference measurements to account for
                      severely imbalanced or sparsely distributed conditions (up
                      to 2x improvement of average-silhouette-width).},
      cin          = {IT},
      ddc          = {500},
      cid          = {I:(DE-H253)IT-20120731},
      pnm          = {623 - Data Management and Analysis (POF4-623)},
      pid          = {G:(DE-HGF)POF4-623},
      experiment   = {EXP:(DE-MLZ)NOSPEC-20140101},
      typ          = {PUB:(DE-HGF)16},
      doi          = {10.1038/s41467-025-62237-4},
      url          = {https://bib-pubdb1.desy.de/record/623116},
}
guest :: login PUBDB
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help