Research - Latest News & Updates

Research Desk

Read Full Article →

02.07.2026 00:08

Persona-Pruner: Slimming Role-Playing LMs by 80%

Persona-Pruner introduces a structured pruning method that reduces role-playing LMs by over 80% with minimal performance loss, targeting the computational bottleneck of deploying multiple persona-driven agents simultaneously. The findings suggest that task-specific pruning can replace full-model fine-tuning for many role-playing applications.

01.07.2026 00:52

AdaSR: RL Beats Supervised Learning for Streaming Reasoning

AdaSR introduces HRPO, a reinforcement learning framework for streaming reasoning that outperforms supervised methods on synthetic benchmarks. The paper highlights a fundamental shift from static 'read-then-think' to adaptive 'think-while-read' paradigms, but open questions remain about real-world validation.

01.07.2026 00:32

PCMA: Coordinated Preferences Reshape Multi-Agent RL

A new arXiv paper introduces PCMA, a method that learns coordinated, agent-specific preferences for multi-objective multi-agent RL, enabling complementary trade-offs across agents. This approach outperforms uniform preference baselines in cooperative scenarios with conflicting objectives.

01.07.2026 00:32

CORA Exposes Thinking-Answer Gap in Multimodal RLVR

CORA identifies a previously underestimated semantic inconsistency in multimodal RLVR and introduces a consistency-oriented alignment method. The findings are promising but require broader validation beyond current benchmarks.

30.06.2026 00:11

ModSleuth Exposes the Hidden Dependency Crisis in AI

ModSleuth reveals that modern LLMs depend on a recursive web of undocumented upstream models, creating a systemic transparency risk. The paper argues for mandatory dependency manifests, threatening to expose the opaque practices of major AI labs.

30.06.2026 00:11

Token Removal Is Dead: Recoverable Routing Wins for VLMs

The paper introduces a recoverable token routing mechanism that allows VLMs to dynamically re-access discarded tokens, promising significant accuracy gains without increasing inference cost. This changes the optimization playbook for developers deploying VLMs in production.

28.06.2026 01:00

Perplexity Agents Cut Knowledge Work Time 70% — But Miss the Big Picture

Perplexity's production data shows AI agents slash task completion time but reduce source diversity. This research brief unpacks what the numbers actually support and what remains uncertain.

27.06.2026 00:37

TempoVLA: Speed-Controllable Robots Still Stuck in Lab

TempoVLA offers a novel approach to speed control in robot manipulation, but the paper's limitations—single demonstration training and lack of real-world testing—raise questions about its practical applicability. This article breaks down the findings, evidence, and uncertainties.

27.06.2026 00:17

Discarded Tokens Are Gold: Diffusion LM Retrieval Breakthrough

Discrete diffusion language models discard low-confidence tokens during generation. A new paper shows those tokens are a goldmine for retrieval-augmented generation, enabling self-augmenting retrieval without external query generation.

27.06.2026 00:17

Failed LLM Traces Reveal Fixable vs. Structural Flaws

A new research paper reveals that failed reasoning traces from LLMs encode a 'recoverability' signature that distinguishes between unlucky sampling errors and structural failures. This insight could reshape how AI developers allocate test-time compute and benchmark model robustness, saving money and focusing on real improvements.

25.06.2026 00:39

LongTraceRL: Dense Rewards Beat Sparse in Long-Context Reasoning

LongTraceRL proposes a new RL framework that replaces sparse outcome rewards with dense rubric-based rewards derived from search agent trajectories, achieving state-of-the-art results on long-context benchmarks. While promising, the approach's dependency on high-quality trajectory data may limit its adoption outside of well-resourced labs.

24.06.2026 00:16

PEFT-Arena: The Benchmark That Exposes LLM Forgetting

PEFT-Arena is the first benchmark to systematically measure the stability-plasticity trade-off in parameter-efficient finetuning. The findings challenge the dominance of methods like LoRA and AdaLoRA, revealing that they sacrifice pretrained capability retention for task adaptation, and open a new front in the PEFT optimization race.

Append the next batch without leaving this page.

← Previous … 2 3 4 5 … Next →

🍪 We Use Cookies