AdamW Is Dead for Tabular MLPs: Lion and Sophia Win the Benchmark

AdamW Is Dead for Tabular MLPs: Lion and Sophia Win the Benchmark

A rigorous benchmark of 19 optimizers on 45 tabular datasets shows that Lion and Sophia beat AdamW, the default optimizer for tabular MLPs. This paper tells the field: your go-to choice is leaving performance on the table.

For years, the deep learning community has treated AdamW as the default optimizer for training MLPs on tabular data — a comfortable but lazy choice. A new systematic benchmark of 19 optimizers across 45 tabular datasets shatters that assumption, revealing that Lion and Sophia consistently surpass AdamW in both accuracy and training efficiency.
  • A new paper benchmarks 19 optimizers on 45 tabular datasets for MLP-based deep learning, the first systematic study of its kind.
  • Lion and Sophia outperform AdamW in accuracy and convergence speed, challenging the default optimizer in tabular deep learning.
  • The field has been optimizing architectures but ignoring optimizers — this paper closes that gap with hard data.
  • Practitioners must now decide: stick with the comfortable AdamW or switch to a better optimizer and gain free accuracy.

Why Has the Field Ignored Optimizers for So Long?

Architecture design gets all the attention. Researchers obsess over attention mechanisms, normalization layers, and residual connections. But the optimizer — the algorithm that actually updates the weights — has been treated as a fixed, boring choice. AdamW became the default because it worked well enough on transformers and image models. Tabular MLPs just inherited it. This paper, from an anonymous team on arXiv (2026-04-16), finally asks: Is AdamW actually the best for tabular data? The answer is no.

What Did the Benchmark Actually Test?

The authors ran 19 optimizers — including Adam, AdamW, SGD, Lion, Sophia, Shampoo, and others — on 45 tabular datasets spanning regression, binary classification, and multi-class classification. They controlled for architecture, learning rate schedules, and compute budget. The results are unambiguous: Lion and Sophia consistently top the accuracy leaderboards, while AdamW sits in the middle of the pack. Lion, originally introduced by Google in 2023, uses a sign-based update rule that is both memory-efficient and fast-converging. Sophia, from 2024, uses a diagonal Hessian estimate for adaptive steps. Both beat AdamW on average across all dataset types.

AdamW Is Dead for Tabular MLPs: Lion and Sophia Win the Benchmark

Who Wins and Who Loses From This Finding?

Winners: Practitioners using tabular MLPs in production — they can switch to Lion or Sophia and get 1-3% accuracy gains with no architectural changes. Optimizer researchers now have a clear benchmark to improve upon. Losers: Anyone who has bet their pipeline on AdamW-specific tuning — hyperparameter searches, learning rate schedules, and weight decay defaults all need re-evaluation. The PyTorch ecosystem, which defaults to AdamW in most tabular DL libraries, faces pressure to update their recommendations.

OptimizerAvg Accuracy Rank (45 datasets)Convergence SpeedMemory OverheadEase of Tuning
Lion1.2FastLowModerate
Sophia1.4FastMediumModerate
AdamW3.1MediumLowEasy
Adam3.5MediumLowEasy
SGD5.8SlowLowHard
VerdictLion wins outright on accuracy and speed; Sophia is a close second. AdamW is no longer the default.

My thesis is direct: AdamW is dead for tabular MLPs, and the field needs to stop treating optimizer choice as an afterthought. This paper is not a marginal improvement — it's a wake-up call. For years, tabular deep learning has been playing catch-up with NLP and vision, borrowing their optimizers without question. This benchmark shows that borrowing is lazy. Lion and Sophia were designed for different domains but happen to excel on tabular data because they handle sparse gradients and mixed-scale features better. Short-term, I expect every major tabular DL library (e.g., PyTorch Tabular, TabNet implementations) to update their default optimizer to Lion within 6 months. Long-term, optimizer design will become a first-class research topic in tabular deep learning. The losers are the researchers who published papers using AdamW as the default — their results may not be reproducible with better optimizers. I predict that by Q4 2026, at least 3 papers will be retracted or corrected because their AdamW-based findings do not hold under Lion or Sophia.

What Concrete Predictions Can We Make?

  1. PyTorch Tabular will switch its default optimizer from AdamW to Lion by January 2027, following the release of a community-driven benchmark that confirms these results.
  2. At least 2 Kaggle Grandmasters will publicly switch to Lion for tabular competitions within 3 months, citing this paper as the catalyst, and will see immediate ranking gains.
  3. The authors of this benchmark will release a follow-up paper by December 2026 extending the study to transformer-based tabular models (e.g., FT-Transformer), showing that Lion still dominates.
  • AdamW is not the best optimizer for tabular MLPs — Lion and Sophia are consistently better across 45 datasets.
  • This is the first systematic optimizer benchmark for tabular deep learning, filling a critical gap in the literature.
  • Practitioners can gain 1-3% accuracy without any architectural changes by switching optimizers.
  • The field must stop treating optimizer choice as a fixed default and start treating it as a hyperparameter worth optimizing.
  • Expect rapid adoption in production libraries and a wave of follow-up research within 12 months.

Source and attribution

arXiv
Benchmarking Optimizers for MLPs in Tabular Deep Learning

Discussion

Add a comment

0/5000
Loading comments...