000639283 001__ 639283
000639283 005__ 20251019055719.0
000639283 0247_ $$2doi$$a10.1186/s12859-025-06073-9
000639283 0247_ $$2ISSN$$a1471-2105
000639283 0247_ $$2datacite_doi$$a10.3204/PUBDB-2025-04384
000639283 0247_ $$2altmetric$$aaltmetric:174083920
000639283 0247_ $$2pmid$$apmid:39934730
000639283 037__ $$aPUBDB-2025-04384
000639283 041__ $$aEnglish
000639283 082__ $$a610
000639283 1001_ $$0P:(DE-HGF)0$$aSchlumbohm, Simon$$b0$$eCorresponding author
000639283 245__ $$aHarmonizR: blocking and singular feature data adjustment improve runtime efficiency and data preservation
000639283 260__ $$aLondon$$bBioMed Central$$c2025
000639283 3367_ $$2DRIVER$$aarticle
000639283 3367_ $$2DataCite$$aOutput Types/Journal article
000639283 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1760617968_2110634
000639283 3367_ $$2BibTeX$$aARTICLE
000639283 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000639283 3367_ $$00$$2EndNote$$aJournal Article
000639283 520__ $$aData adjustment is an essential tool for increasing statistical power during analysis, for example in case of complex multi-experiment data from (single-cell) RNA, proteomics and other omics data. Despite its benefits, data integration introduces internal biases—so-called batch effects. Due to the inherent presence of missing values by such methods and their additional introduction by means of data integration, renowned algorithms such as ComBat and limma are unable to perform batch effect adjustment. Recently, the HarmonizR framework was presented for these cases, which is a tool for missing value tolerant data adjustment.In this contribution, we provide significant improvements to the HarmonizR approach. A novel blocking strategy is introduced to severely reduce runtime, while still supporting parallel architectures. Additionally, a “unique removal” strategy has been integrated into HarmonizR to maintain even more features for adjustment in datasets, showing a feature rescue of up to 103.9% for our tested datasets. In this work, we show (1) severely improved runtime for both small and large, real datasets and (2) the ability retain more features from the integrated dataset during adjustment, showing a feature rescue of up to 103.9% for our tested datasets.The proposed improvements tackle the previous shortcomings of the published HarmonizR version. Since HarmonizR was mainly developed for dataset integration on rare tumor entities, it did not include runtime improvements beyond parallelization, which has been addressed in this update. An additionally welcome update regarding improved feature rescue furthermore enhances the algorithms ability to quickly and robustly perform batch effect reduction.
000639283 536__ $$0G:(DE-HGF)POF4-623$$a623 - Data Management and Analysis (POF4-623)$$cPOF4-623$$fPOF IV$$x0
000639283 588__ $$aDataset connected to CrossRef, Journals: bib-pubdb1.desy.de
000639283 693__ $$0EXP:(DE-MLZ)NOSPEC-20140101$$5EXP:(DE-MLZ)NOSPEC-20140101$$eNo specific instrument$$x0
000639283 7001_ $$aNeumann, Julia E.$$b1
000639283 7001_ $$0P:(DE-H253)PIP1106404$$aNeumann, Philipp$$b2$$udesy
000639283 773__ $$0PERI:(DE-600)2041484-5$$a10.1186/s12859-025-06073-9$$gVol. 26, no. 1, p. 47$$n1$$p47$$tBMC bioinformatics$$v26$$x1471-2105$$y2025
000639283 8564_ $$uhttps://bib-pubdb1.desy.de/record/639283/files/s12859-025-06073-9.pdf$$yOpenAccess
000639283 8564_ $$uhttps://bib-pubdb1.desy.de/record/639283/files/s12859-025-06073-9.pdf?subformat=pdfa$$xpdfa$$yOpenAccess
000639283 909CO $$ooai:bib-pubdb1.desy.de:639283$$pdnbdelivery$$pdriver$$pVDB$$popen_access$$popenaire
000639283 9101_ $$0I:(DE-588b)2008985-5$$6P:(DE-H253)PIP1106404$$aDeutsches Elektronen-Synchrotron$$b2$$kDESY
000639283 9101_ $$0I:(DE-HGF)0$$6P:(DE-H253)PIP1106404$$aExternal Institute$$b2$$kExtern
000639283 9131_ $$0G:(DE-HGF)POF4-623$$1G:(DE-HGF)POF4-620$$2G:(DE-HGF)POF4-600$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$aDE-HGF$$bForschungsbereich Materie$$lMaterie und Technologie$$vData Management and Analysis$$x0
000639283 9141_ $$y2025
000639283 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS$$d2025-01-01
000639283 915__ $$0StatID:(DE-HGF)0160$$2StatID$$aDBCoverage$$bEssential Science Indicators$$d2025-01-01
000639283 915__ $$0StatID:(DE-HGF)1050$$2StatID$$aDBCoverage$$bBIOSIS Previews$$d2025-01-01
000639283 915__ $$0StatID:(DE-HGF)1190$$2StatID$$aDBCoverage$$bBiological Abstracts$$d2025-01-01
000639283 915__ $$0StatID:(DE-HGF)0600$$2StatID$$aDBCoverage$$bEbsco Academic Search$$d2025-01-01
000639283 915__ $$0StatID:(DE-HGF)0100$$2StatID$$aJCR$$bBMC BIOINFORMATICS : 2022$$d2025-01-01
000639283 915__ $$0StatID:(DE-HGF)0501$$2StatID$$aDBCoverage$$bDOAJ Seal$$d2024-04-10T15:34:04Z
000639283 915__ $$0StatID:(DE-HGF)0500$$2StatID$$aDBCoverage$$bDOAJ$$d2024-04-10T15:34:04Z
000639283 915__ $$0StatID:(DE-HGF)0113$$2StatID$$aWoS$$bScience Citation Index Expanded$$d2025-01-01
000639283 915__ $$0StatID:(DE-HGF)0700$$2StatID$$aFees$$d2025-01-01
000639283 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection$$d2025-01-01
000639283 915__ $$0StatID:(DE-HGF)9900$$2StatID$$aIF < 5$$d2025-01-01
000639283 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000639283 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bASC$$d2025-01-01
000639283 915__ $$0StatID:(DE-HGF)0561$$2StatID$$aArticle Processing Charges$$d2025-01-01
000639283 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline$$d2025-01-01
000639283 915__ $$0LIC:(DE-HGF)CCBY4$$2HGFVOC$$aCreative Commons Attribution CC BY 4.0
000639283 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bClarivate Analytics Master Journal List$$d2025-01-01
000639283 9201_ $$0I:(DE-H253)IT-20120731$$kIT$$lInformationstechnologie$$x0
000639283 980__ $$ajournal
000639283 980__ $$aVDB
000639283 980__ $$aUNRESTRICTED
000639283 980__ $$aI:(DE-H253)IT-20120731
000639283 9801_ $$aFullTexts