Analysis of mean-field models arising from self-attention dynamics in transformer architectures with layer normalization

Burger, Martin; Roith, Tim; Korolev, Yury; Kabri, Samira; Weigand, Lukas

doi:10.1098/rsta.2024.0233

Journal Article

PUBDB-2025-01273

Analysis of mean-field models arising from self-attention dynamics in transformer architectures with layer normalization

Burger, M. (Corresponding author)DESY* ; Kabri, S.DESY* ; Korolev, Y. ; Roith, T.Extern*DESY* ; Weigand, L.DESY*

2025
Royal Soc. London

Philosophical transactions of the Royal Society of London / Series A 383(2298), 20240233 (2025) [10.1098/rsta.2024.0233] special issue: "Partial differential equations in data science"

This record in other databases:

Please use a persistent id in citations: doi:10.1098/rsta.2024.0233 doi:10.3204/PUBDB-2025-01273

Abstract: The aim of this paper is to provide a mathematical analysis of transformer architectures using aself-attention mechanism with layer normalization. In particular, observed patterns in such architecturesresembling either clusters or uniform distributions pose a number of challenging mathematical questions.We focus on a special case that admits a gradient flow formulation in the spaces of probability measureson the unit sphere under a special metric, which allows us to give at least partial answers in a rigorousway. The arising mathematical problems resemble those recently studied in aggregation equations, butwith additional challenges emerging from restricting the dynamics to the sphere and the particular formof the interaction energy.We provide a rigorous framework for studying the gradient flow, which also suggests a possible metricgeometry to study the general case (i.e. one that is not described by a gradient flow). We further analyzethe stationary points of the induced self-attention dynamics. The latter are related to stationary pointsof the interaction energy in the Wasserstein geometry, and we further discuss energy minimizers andmaximizers in different parameter settings.

Classification:

ddc:510

Note: ISSN 1471-2962 not unique: **2 hits**.

Contributing Institute(s):

Computational Imaging (FS-CI)

Research Program(s):

Experiment(s):

No specific instrument

Appears in the scientific report 2025

Database coverage:
Medline

;

;

; Clarivate Analytics Master Journal List ; Current Contents - Physical, Chemical and Earth Sciences ; Ebsco Academic Search ; Essential Science Indicators ; IF >= 5 ; JCR ; National-Konsortium ; PubMed Central ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection

Click to display QR Code for this record

The record appears in these collections:
Private Collections > >DESY > >FS > FS-CI
Document types > Articles > Journal Article
Public records
Publication Charges
Publications database
OpenAccess

Record created 2025-04-07, last modified 2025-07-22

Similar records

OpenAccess:

PDF

PDF (PDFA)
(additional files)
External link:

Fulltext

Rate this document:

(Not yet reviewed)

Add to personal basket
Export as Author List with IDs BibTeX (UTF-8), EndNote XML, EndNote Text, RIS, MARC, Print MARC, MARCXML, DC,
Request correction
Submit fulltext

guest :: login PUBDB
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help