000626051 001__ 626051
000626051 005__ 20250722113403.0
000626051 0247_ $$2doi$$a10.1098/rsta.2024.0233
000626051 0247_ $$2ISSN$$a1364-503X
000626051 0247_ $$2ISSN$$a0080-4614
000626051 0247_ $$2ISSN$$a0264-3820
000626051 0247_ $$2ISSN$$a0264-3952
000626051 0247_ $$2ISSN$$a1471-2962
000626051 0247_ $$2ISSN$$a2053-9231
000626051 0247_ $$2ISSN$$a2053-9258
000626051 0247_ $$2ISSN$$a2054-0272
000626051 0247_ $$2datacite_doi$$a10.3204/PUBDB-2025-01273
000626051 0247_ $$2openalex$$aopenalex:W4411077997
000626051 037__ $$aPUBDB-2025-01273
000626051 041__ $$aEnglish
000626051 082__ $$a510
000626051 1001_ $$0P:(DE-H253)PIP1103953$$aBurger, Martin$$b0$$eCorresponding author
000626051 245__ $$aAnalysis of mean-field models arising from self-attention dynamics in transformer architectures with layer normalization
000626051 260__ $$aLondon$$bRoyal Soc.$$c2025
000626051 3367_ $$2DRIVER$$aarticle
000626051 3367_ $$2DataCite$$aOutput Types/Journal article
000626051 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1752151717_176320
000626051 3367_ $$2BibTeX$$aARTICLE
000626051 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000626051 3367_ $$00$$2EndNote$$aJournal Article
000626051 500__ $$aISSN 1471-2962 not unique: **2 hits**.
000626051 520__ $$aThe aim of this paper is to provide a mathematical analysis of transformer architectures using aself-attention mechanism with layer normalization. In particular, observed patterns in such architecturesresembling either clusters or uniform distributions pose a number of challenging mathematical questions.We focus on a special case that admits a gradient flow formulation in the spaces of probability measureson the unit sphere under a special metric, which allows us to give at least partial answers in a rigorousway. The arising mathematical problems resemble those recently studied in aggregation equations, butwith additional challenges emerging from restricting the dynamics to the sphere and the particular formof the interaction energy.We provide a rigorous framework for studying the gradient flow, which also suggests a possible metricgeometry to study the general case (i.e. one that is not described by a gradient flow). We further analyzethe stationary points of the induced self-attention dynamics. The latter are related to stationary pointsof the interaction energy in the Wasserstein geometry, and we further discuss energy minimizers andmaximizers in different parameter settings.
000626051 536__ $$0G:(DE-HGF)POF4-623$$a623 - Data Management and Analysis (POF4-623)$$cPOF4-623$$fPOF IV$$x0
000626051 536__ $$0G:(GEPRIS)464101359$$aDFG project G:(GEPRIS)464101359 - Deep-Learning basierte Regularisierung inverser Probleme (464101359)$$c464101359$$x1
000626051 536__ $$0G:(GEPRIS)464101190$$aDFG project G:(GEPRIS)464101190 - Theoretischer Grundlagen des Unsicherheits-robusten Deep Learning für Inverse Probleme (464101190)$$c464101190$$x2
000626051 588__ $$aDataset connected to CrossRef, Journals: bib-pubdb1.desy.de
000626051 693__ $$0EXP:(DE-MLZ)NOSPEC-20140101$$5EXP:(DE-MLZ)NOSPEC-20140101$$eNo specific instrument$$x0
000626051 7001_ $$0P:(DE-H253)PIP1106483$$aKabri, Samira$$b1
000626051 7001_ $$0P:(DE-HGF)0$$aKorolev, Yury$$b2
000626051 7001_ $$0P:(DE-H253)PIP1106486$$aRoith, Tim$$b3
000626051 7001_ $$0P:(DE-H253)PIP1106485$$aWeigand, Lukas$$b4
000626051 770__ $$aPartial differential equations in data science
000626051 773__ $$0PERI:(DE-600)2012985-3$$a10.1098/rsta.2024.0233$$gVol. 383, no. 2298, p. 20240233$$n2298$$p20240233$$tPhilosophical transactions of the Royal Society of London / Series A$$v383$$x1364-503X$$y2025
000626051 8564_ $$uhttps://royalsocietypublishing.org/doi/10.1098/rsta.2024.0233
000626051 8564_ $$uhttps://bib-pubdb1.desy.de/record/626051/files/HTML-Approval_of_scientific_publication.html
000626051 8564_ $$uhttps://bib-pubdb1.desy.de/record/626051/files/Institution%20Portal.pdf
000626051 8564_ $$uhttps://bib-pubdb1.desy.de/record/626051/files/PDF-Approval_of_scientific_publication.pdf
000626051 8564_ $$uhttps://bib-pubdb1.desy.de/record/626051/files/Institution%20Portal.pdf?subformat=pdfa$$xpdfa
000626051 8564_ $$uhttps://bib-pubdb1.desy.de/record/626051/files/Manuscript.pdf$$yRestricted
000626051 8564_ $$uhttps://bib-pubdb1.desy.de/record/626051/files/Publisher%27s%20PDF.pdf$$yOpenAccess
000626051 8564_ $$uhttps://bib-pubdb1.desy.de/record/626051/files/Manuscript.pdf?subformat=pdfa$$xpdfa$$yRestricted
000626051 8564_ $$uhttps://bib-pubdb1.desy.de/record/626051/files/Publisher%27s%20PDF.pdf?subformat=pdfa$$xpdfa$$yOpenAccess
000626051 8767_ $$92025-04-16$$d2025-04-16$$eHybrid-OA$$jPublish and Read$$lRoyal Society, London$$zQuarterly report 1.7.25
000626051 8767_ $$92025-04-16$$d2025-04-16$$eHybrid-OA$$jStorniert$$lRoyal Society, London$$zDFG OAPK (Projekt) verrechnet durch V3
000626051 8767_ $$92025-04-16$$d2025-04-16$$eHybrid-OA$$jZahlung erfolgt$$lRoyal Society, London$$zDFG OAPK (Projekt) verrechnet durch V3
000626051 909CO $$ooai:bib-pubdb1.desy.de:626051$$pdnbdelivery$$popenCost$$pVDB$$pdriver$$pOpenAPC$$popen_access$$popenaire
000626051 9101_ $$0I:(DE-588b)2008985-5$$6P:(DE-H253)PIP1103953$$aDeutsches Elektronen-Synchrotron$$b0$$kDESY
000626051 9101_ $$0I:(DE-588b)2008985-5$$6P:(DE-H253)PIP1106483$$aDeutsches Elektronen-Synchrotron$$b1$$kDESY
000626051 9101_ $$0I:(DE-588b)2008985-5$$6P:(DE-H253)PIP1106486$$aDeutsches Elektronen-Synchrotron$$b3$$kDESY
000626051 9101_ $$0I:(DE-HGF)0$$6P:(DE-H253)PIP1106486$$aExternal Institute$$b3$$kExtern
000626051 9101_ $$0I:(DE-588b)2008985-5$$6P:(DE-H253)PIP1106485$$aDeutsches Elektronen-Synchrotron$$b4$$kDESY
000626051 9131_ $$0G:(DE-HGF)POF4-623$$1G:(DE-HGF)POF4-620$$2G:(DE-HGF)POF4-600$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$aDE-HGF$$bForschungsbereich Materie$$lMaterie und Technologie$$vData Management and Analysis$$x0
000626051 9141_ $$y2025
000626051 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS$$d2024-02-05
000626051 915__ $$0StatID:(DE-HGF)0160$$2StatID$$aDBCoverage$$bEssential Science Indicators$$d2024-02-05
000626051 915__ $$0LIC:(DE-HGF)CCBY4$$2HGFVOC$$aCreative Commons Attribution CC BY 4.0
000626051 915__ $$0StatID:(DE-HGF)0600$$2StatID$$aDBCoverage$$bEbsco Academic Search$$d2024-02-05
000626051 915__ $$0StatID:(DE-HGF)0100$$2StatID$$aJCR$$bPHILOS T R SOC A : 2022$$d2024-02-05
000626051 915__ $$0StatID:(DE-HGF)9905$$2StatID$$aIF >= 5$$bPHILOS T R SOC A : 2022$$d2024-02-05
000626051 915__ $$0StatID:(DE-HGF)0113$$2StatID$$aWoS$$bScience Citation Index Expanded$$d2024-02-05
000626051 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection$$d2024-02-05
000626051 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000626051 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bASC$$d2024-02-05
000626051 915__ $$0StatID:(DE-HGF)1150$$2StatID$$aDBCoverage$$bCurrent Contents - Physical, Chemical and Earth Sciences$$d2024-02-05
000626051 915__ $$0StatID:(DE-HGF)0430$$2StatID$$aNational-Konsortium$$d2024-02-05$$wger
000626051 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline$$d2024-02-05
000626051 915__ $$0StatID:(DE-HGF)0320$$2StatID$$aDBCoverage$$bPubMed Central$$d2024-02-05
000626051 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bClarivate Analytics Master Journal List$$d2024-02-05
000626051 915pc $$0PC:(DE-HGF)0000$$2APC$$aAPC keys set
000626051 915pc $$0PC:(DE-HGF)0001$$2APC$$aLocal Funding
000626051 915pc $$0PC:(DE-HGF)0002$$2APC$$aDFG OA Publikationskosten
000626051 9201_ $$0I:(DE-H253)FS-CI-20230420$$kFS-CI$$lComputational Imaging$$x0
000626051 980__ $$ajournal
000626051 980__ $$aVDB
000626051 980__ $$aUNRESTRICTED
000626051 980__ $$aI:(DE-H253)FS-CI-20230420
000626051 980__ $$aAPC
000626051 9801_ $$aAPC
000626051 9801_ $$aFullTexts