OceanCBM: A Concept Bottleneck Model for Mechanistic Interpretability in Ocean Forecasting
arXiv:2605.12639v1 Announce Type: new
Abstract: Extreme ocean phenomena are challenging not only to predict but to diagnose, as accurate forecasts alone do not reveal the underlying physical drivers. While recent machine learning approaches achieve strong predictive skill, they remain largely opaqu...
Vertex-Softmax: Tight Transformer Verification via Exact Softmax Optimization
arXiv:2605.10974v1 Announce Type: new
Abstract: Certified verification of transformer attention requires bounding the softmax function over interval constraints on the pre-softmax scores. Existing verifiers relax softmax ndependently of the downstream objective, leaving avoidable slack. We prove th...
QuIDE: Mastering the Quantized Intelligence Trade-off via Active Optimization
arXiv:2605.10959v1 Announce Type: new
Abstract: There is currently no unified metric for evaluating the efficiency of quantized neural networks. We propose QuIDE, built around the Intelligence Index I = (C x P)/log_2(T+1), which collapses the compression-accuracy-latency trade-off into a single sco...
Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models
arXiv:2605.10971v1 Announce Type: new
Abstract: Discrete diffusion language models (DLMs) generate text by iteratively denoising all positions in parallel, offering an alternative to autoregressive models. Controlled generation methods for DLMs, imported from autoregressive models, apply uniform in...
OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents
arXiv:2605.11169v1 Announce Type: new
Abstract: Large language model agents interleave reasoning, action selection, and observation to solve sequential decision-making tasks. In deployed settings where agents repeatedly handle related multi-step tasks, small action-selection errors can accumulate i...
EVOCHAMBER: Test-Time Co-evolution of Multi-Agent System at Individual, Team, and Population Scales
arXiv:2605.11136v1 Announce Type: new
Abstract: We argue that multi-agent test-time evolution is not single-agent evolution replicated N times. A single-agent learner can only evolve its own context and memory. A multi-agent system additionally evolves who collaborates, how they collaborate, and ho...
The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes
arXiv:2605.11182v1 Announce Type: new
Abstract: On-policy distillation (OPD) and on-policy self-distillation (OPSD) have emerged as promising post-training methods for large language models, offering dense token-level supervision on trajectories sampled from the model's own policy. However, existin...
Geometry-free prediction of inertial lift forces in microfluidic devices using deep learning
arXiv:2605.08109v1 Announce Type: new
Abstract: Inertial microfluidic devices (IMDs) offer low-cost, high-throughput alternative techniques for many traditional particle- (or cell-) manipulation tasks, but simulating them requires being able to predict particle migration, and thus particle lift for...
Path-Based Gradient Boosting for Graph-Level Prediction
arXiv:2605.08102v1 Announce Type: new
Abstract: We propose PathBoost, a gradient tree boosting method for graph-level classification and regression that learns discriminative path-based features directly from the input graph structure. Building on a previous work, which was tailored to a specific c...
Reinforcement learning for inverse structural design and rapid laser cutting of kirigami prototypes
arXiv:2605.08098v1 Announce Type: new
Abstract: Kirigami is an increasingly useful fabrication method to produce shape-programmable metamaterial structures. However, inverse design remains difficult because deployment is nonlinear, and feasible cut layouts must satisfy discrete compatibility rules,...
BaLoRA: Bayesian Low-Rank Adaptation of Large Scale Models
arXiv:2605.08110v1 Announce Type: new
Abstract: Low-Rank Adaptation (LoRA) has become the standard for fine-tuning large pre-trained models at reduced computational cost. However, its low-rank point-estimate updates limit expressiveness, leave a persistent gap relative to full fine-tuning accuracy,...
On Distinguishing Capability Elicitation from Capability Creation in Post-Training: A Free-Energy Perspective
arXiv:2605.08368v1 Announce Type: new
Abstract: Debates about large language model post-training often treat supervised fine-tuning (SFT) as imitation and reinforcement learning (RL) as discovery. But this distinction is too coarse. What matters is whether a training procedure increases the probabi...
Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria
arXiv:2605.08354v1 Announce Type: new
Abstract: Aligning multimodal generative models with human preferences demands reward signals that respect the compositional, multi-dimensional structure of human judgment. Prevailing RLHF approaches reduce this structure to scalar or pairwise labels, collapsin...
Spatial Priming Outperforms Semantic Prompting: A Grid-Based Approach to Improving LLM Accuracy on Chart Data Extraction
arXiv:2605.08220v1 Announce Type: new
Abstract: The automated extraction of data from scientific charts is a critical task for large-scale literature analysis. While multimodal Large Language Models (LLMs) show promise, their accuracy on non-standardized charts remains a challenge. This raises a ke...
Breaking the Illusion: When Positive Meets Negative in Multimodal Decoding
arXiv:2605.06679v1 Announce Type: new
Abstract: Vision-Language Models (VLMs) are frequently undermined by object hallucination, generating content that contradicts visual reality, due to an over-reliance on linguistic priors. We introduce Positive-and-Negative Decoding (PND), a training-free infer...
RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory
arXiv:2605.06675v1 Announce Type: new
Abstract: Large language models cache all previously computed key-value (KV) pairs during generation, and this KV cache grows linearly with sequence length, making it a primary memory bottleneck for serving. Quantizing the KV cache to fewer bits reduces this co...
A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence
arXiv:2605.06678v1 Announce Type: new
Abstract: According to the United Nations Office for Disaster Risk Reduction (2025), the average annual cost of natural catastrophes increased from 70--80 billion USD between 1970 and 2000 to 180--200 billion USD between 2001 and 2020. Reports from organization...
LKV: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache Eviction
arXiv:2605.06676v1 Announce Type: new
Abstract: Long-context inference in Large Language Models (LLMs) is bottlenecked by the linear growth of Key-Value (KV) cache memory. Existing KV cache compression paradigms are fundamentally limited by heuristics: heuristic budgeting relies on statistical prio...
More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models
arXiv:2605.06672v1 Announce Type: new
Abstract: Chain-of-thought (CoT) reasoning and reasoning-tuned models such as DeepSeek-R1 are commonly assumed to reduce shallow heuristic biases by thinking carefully. We test this on position bias in multiple-choice QA and find a different story: within any r...
GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning
arXiv:2605.06671v1 Announce Type: new
Abstract: Large Language Models (LLMs) have demonstrated strong potential for many mathematical problems. However, their performance on graph algorithmic tasks is still unsatisfying, since graphs are naturally more complex in topology and often require systemat...
Fast and Effective Redistricting Optimization via Composite-Move Tabu Search
arXiv:2605.06682v1 Announce Type: new
Abstract: Spatial redistricting is a practical combinatorial optimization problem that demands high-quality solutions, rapid turnaround, and flexibility to accommodate multi-criteria objectives and interactive refinement. A central challenge is the contiguity c...
Hidden Coalitions in Multi-Agent AI: A Spectral Diagnostic from Internal Representations
arXiv:2605.06696v1 Announce Type: new
Abstract: Collections of interacting AI agents can form coalitions, creating emergent group-level organization that is critical for AI safety and alignment. However, observing agent behavior alone is often insufficient to distinguish genuine informational coupl...
State Representation and Termination for Recursive Reasoning Systems
arXiv:2605.06690v1 Announce Type: new
Abstract: Recursive reasoning systems alternate between acquiring new evidence and refining an accumulated understanding. Two design choices are typically left implicit: how to represent the evolving reasoning state, and when to stop iterating. This paper addre...
Understanding Annotator Safety Policy with Interpretability
arXiv:2605.05329v1 Announce Type: new
Abstract: Safety policies define what constitutes safe and unsafe AI outputs, guiding data annotation and model development. However, annotation disagreement is pervasive and can stem from multiple sources such as operational failures (annotators misunderstand ...