001     639283
005     20251019055719.0
024 7 _ |a 10.1186/s12859-025-06073-9
|2 doi
024 7 _ |a 1471-2105
|2 ISSN
024 7 _ |a 10.3204/PUBDB-2025-04384
|2 datacite_doi
024 7 _ |a altmetric:174083920
|2 altmetric
024 7 _ |a pmid:39934730
|2 pmid
037 _ _ |a PUBDB-2025-04384
041 _ _ |a English
082 _ _ |a 610
100 1 _ |a Schlumbohm, Simon
|0 P:(DE-HGF)0
|b 0
|e Corresponding author
245 _ _ |a HarmonizR: blocking and singular feature data adjustment improve runtime efficiency and data preservation
260 _ _ |a London
|c 2025
|b BioMed Central
336 7 _ |a article
|2 DRIVER
336 7 _ |a Output Types/Journal article
|2 DataCite
336 7 _ |a Journal Article
|b journal
|m journal
|0 PUB:(DE-HGF)16
|s 1760617968_2110634
|2 PUB:(DE-HGF)
336 7 _ |a ARTICLE
|2 BibTeX
336 7 _ |a JOURNAL_ARTICLE
|2 ORCID
336 7 _ |a Journal Article
|0 0
|2 EndNote
520 _ _ |a Data adjustment is an essential tool for increasing statistical power during analysis, for example in case of complex multi-experiment data from (single-cell) RNA, proteomics and other omics data. Despite its benefits, data integration introduces internal biases—so-called batch effects. Due to the inherent presence of missing values by such methods and their additional introduction by means of data integration, renowned algorithms such as ComBat and limma are unable to perform batch effect adjustment. Recently, the HarmonizR framework was presented for these cases, which is a tool for missing value tolerant data adjustment.In this contribution, we provide significant improvements to the HarmonizR approach. A novel blocking strategy is introduced to severely reduce runtime, while still supporting parallel architectures. Additionally, a “unique removal” strategy has been integrated into HarmonizR to maintain even more features for adjustment in datasets, showing a feature rescue of up to 103.9% for our tested datasets. In this work, we show (1) severely improved runtime for both small and large, real datasets and (2) the ability retain more features from the integrated dataset during adjustment, showing a feature rescue of up to 103.9% for our tested datasets.The proposed improvements tackle the previous shortcomings of the published HarmonizR version. Since HarmonizR was mainly developed for dataset integration on rare tumor entities, it did not include runtime improvements beyond parallelization, which has been addressed in this update. An additionally welcome update regarding improved feature rescue furthermore enhances the algorithms ability to quickly and robustly perform batch effect reduction.
536 _ _ |a 623 - Data Management and Analysis (POF4-623)
|0 G:(DE-HGF)POF4-623
|c POF4-623
|f POF IV
|x 0
588 _ _ |a Dataset connected to CrossRef, Journals: bib-pubdb1.desy.de
693 _ _ |0 EXP:(DE-MLZ)NOSPEC-20140101
|5 EXP:(DE-MLZ)NOSPEC-20140101
|e No specific instrument
|x 0
700 1 _ |a Neumann, Julia E.
|b 1
700 1 _ |a Neumann, Philipp
|0 P:(DE-H253)PIP1106404
|b 2
|u desy
773 _ _ |a 10.1186/s12859-025-06073-9
|g Vol. 26, no. 1, p. 47
|0 PERI:(DE-600)2041484-5
|n 1
|p 47
|t BMC bioinformatics
|v 26
|y 2025
|x 1471-2105
856 4 _ |y OpenAccess
|u https://bib-pubdb1.desy.de/record/639283/files/s12859-025-06073-9.pdf
856 4 _ |y OpenAccess
|x pdfa
|u https://bib-pubdb1.desy.de/record/639283/files/s12859-025-06073-9.pdf?subformat=pdfa
909 C O |o oai:bib-pubdb1.desy.de:639283
|p openaire
|p open_access
|p VDB
|p driver
|p dnbdelivery
910 1 _ |a Deutsches Elektronen-Synchrotron
|0 I:(DE-588b)2008985-5
|k DESY
|b 2
|6 P:(DE-H253)PIP1106404
910 1 _ |a External Institute
|0 I:(DE-HGF)0
|k Extern
|b 2
|6 P:(DE-H253)PIP1106404
913 1 _ |a DE-HGF
|b Forschungsbereich Materie
|l Materie und Technologie
|1 G:(DE-HGF)POF4-620
|0 G:(DE-HGF)POF4-623
|3 G:(DE-HGF)POF4
|2 G:(DE-HGF)POF4-600
|4 G:(DE-HGF)POF
|v Data Management and Analysis
|x 0
914 1 _ |y 2025
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0200
|2 StatID
|b SCOPUS
|d 2025-01-01
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0160
|2 StatID
|b Essential Science Indicators
|d 2025-01-01
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)1050
|2 StatID
|b BIOSIS Previews
|d 2025-01-01
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)1190
|2 StatID
|b Biological Abstracts
|d 2025-01-01
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0600
|2 StatID
|b Ebsco Academic Search
|d 2025-01-01
915 _ _ |a JCR
|0 StatID:(DE-HGF)0100
|2 StatID
|b BMC BIOINFORMATICS : 2022
|d 2025-01-01
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0501
|2 StatID
|b DOAJ Seal
|d 2024-04-10T15:34:04Z
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0500
|2 StatID
|b DOAJ
|d 2024-04-10T15:34:04Z
915 _ _ |a WoS
|0 StatID:(DE-HGF)0113
|2 StatID
|b Science Citation Index Expanded
|d 2025-01-01
915 _ _ |a Fees
|0 StatID:(DE-HGF)0700
|2 StatID
|d 2025-01-01
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0150
|2 StatID
|b Web of Science Core Collection
|d 2025-01-01
915 _ _ |a IF < 5
|0 StatID:(DE-HGF)9900
|2 StatID
|d 2025-01-01
915 _ _ |a OpenAccess
|0 StatID:(DE-HGF)0510
|2 StatID
915 _ _ |a Peer Review
|0 StatID:(DE-HGF)0030
|2 StatID
|b ASC
|d 2025-01-01
915 _ _ |a Article Processing Charges
|0 StatID:(DE-HGF)0561
|2 StatID
|d 2025-01-01
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0300
|2 StatID
|b Medline
|d 2025-01-01
915 _ _ |a Creative Commons Attribution CC BY 4.0
|0 LIC:(DE-HGF)CCBY4
|2 HGFVOC
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0199
|2 StatID
|b Clarivate Analytics Master Journal List
|d 2025-01-01
920 1 _ |0 I:(DE-H253)IT-20120731
|k IT
|l Informationstechnologie
|x 0
980 _ _ |a journal
980 _ _ |a VDB
980 _ _ |a UNRESTRICTED
980 _ _ |a I:(DE-H253)IT-20120731
980 1 _ |a FullTexts


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21