How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning
arXiv:2605.23926v1 Announce Type: new
Abstract: Reasoning-capable large language models solve hard problems by emitting long chains of thought, paying heavily in latency, GPU time, and energy. Casual inspection of their traces reveals extensive reformulation, verification, and circular self-reflect...
arXiv:2605.23984v1 Announce Type: new
Abstract: Industrial anomaly detection has attracted significant attention as a fundamental challenge in industrial systems. The rapid advancement of heterogeneous industrial sensors has driven industrial anomaly detection from unimodal to multimodal paradigms....
Iterative Refinement Neural Operators are Learned Fixed-Point Solvers: A Principled Approach to Spectral Bias Mitigation
arXiv:2605.24041v1 Announce Type: new
Abstract: Neural operators serve as fast, data-driven surrogates for scientific modeling but typically rely on a monolithic, single-pass inference procedure that struggles to resolve high-frequency details, a limitation known as spectral bias. We introduce the ...
Towards Verifiable Transformers: Solver-Checkable Circuit Explanations
arXiv:2605.24033v1 Announce Type: new
Abstract: Mechanistic interpretability often identifies circuits inside Transformer models, but explanations of those circuits are usually validated through examples, ablations, and manual reasoning. This leaves a gap between finding a plausible circuit and pro...
Algometrics: Forecasting Under Algorithmic Feedback
arXiv:2605.23978v1 Announce Type: new
Abstract: In algorithmic markets, predictive models become part of the data-generating process they aim to forecast. Once their outputs are converted into trades, allocations, execution schedules, or risk controls, they change the future data on which they are ...
Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems
arXiv:2605.22883v1 Announce Type: new
Abstract: Current AI energy benchmarks measure consumption at the granularity of a single model invocation or training run. For classical single-turn workloads this unit remains coherent. For agentic systems - where a single user goal may trigger multi-step orc...
SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research
arXiv:2605.22878v1 Announce Type: new
Abstract: The exponential growth of global academic output has confronted researchers and AI agents with an unprecedented ``information explosion,'' where fragmented and unstructured knowledge organization impedes deep interdisciplinary integration. Current aca...
BOHM: Zero-Cost Hierarchical Attribution for Compound AI Systems
arXiv:2605.22866v1 Announce Type: new
Abstract: Compound AI systems route tasks through hierarchies of specialised components. Attribution is dominated by Shapley-based methods (SHAP), which decompose a coalition value function into per-component marginal contributions and require evaluation of the...
RMA: an Agentic System for Research-Level Mathematical Problems
arXiv:2605.22875v1 Announce Type: new
Abstract: We present $\textbf{Research Math Agents (RMA)}$, an agentic framework for automated reasoning on research-level mathematical problems. Unlike prior studies centered on competition mathematics or formal theorem proving, RMA targets research-level math...
NeuroNL2LTL: A Neurosymbolic Framework for Natural Language Translation of Linear Temporal Logic
arXiv:2605.22874v1 Announce Type: new
Abstract: Effectively translating between natural language (NL) and formal logics like Linear Temporal Logic (LTL) requires expertise that limits formal verification's reach in safety-critical development. Template-based approaches sacrifice expressiveness for ...
The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models
arXiv:2605.22870v1 Announce Type: new
Abstract: Chain-of-thought (CoT) prompting is necessary for arithmetic in small language models, yet shuffling its steps preserves most performance. What does CoT contribute if not logical sequencing? In three 1-3B instruction-tuned LMs on GSM8K, we isolate the...
Latent Cache Flow: Model-to-Model Communication Without Text
arXiv:2605.22863v1 Announce Type: new
Abstract: LLM agents today communicate via text, which incurs considerable latency and information loss due to the need to autoregressively decode the sharer model's state and encode at the receiver model. Recent work such as Cache-to-Cache (C2C; Fu et al., 202...
Reading Calibrated Uncertainty from Language Model Trajectories
arXiv:2605.22864v1 Announce Type: new
Abstract: The maximum softmax probability (MSP) represents a default approach when evaluating uncertainty quantification for language model generation with structured output. Although cheap, it is often miscalibrated. Methods that probe the model's internal act...
FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral Preconditioning
arXiv:2605.22869v1 Announce Type: new
Abstract: Both full fine-tuning (Full FT) and parameter-efficient fine-tuning methods such as LoRA introduce weight updates without accounting for the spectral structure established during pretraining. As a result, noisy gradients from limited fine-tuning data ...
Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs
arXiv:2605.21602v1 Announce Type: new
Abstract: Many safety and alignment failures of large language models (LLMs) occur due to out-of-distribution (OOD) situations: unusual prompt or response patterns that are unforeseen by model developers. We systematically study whether LLM monitoring pipelines...
AOP-Wiki EMOD 3.0: Data Model Expansions and Content Evaluation Framework for Using Agentic AI to Improve Integration between AOPs and New Approach Methodologies (NAMs)
arXiv:2605.21645v1 Announce Type: new
Abstract: Adverse Outcome Pathways (AOP) are logic models that causally link biological mechanisms that can be measured in a lab to adverse outcomes, relevant to chemical regulatory endpoints. AOPs contextualize new approach methodologies (NAMs), in vitro and i...
The Shape of Testimony: A Scalable Framework for Oral History Archive Comparison
arXiv:2605.21623v1 Announce Type: new
Abstract: Researchers in Holocaust studies have often distinguished between two styles of oral survivor testimony: the USC Shoah Foundation's interviews tend to follow a structured, interviewer-guided format, whereas the Yale Fortunoff Video Archive generally f...
MindLoom: Composing Thought Modes for Frontier-Level Reasoning Data Synthesis
arXiv:2605.21630v1 Announce Type: new
Abstract: Although LLMs have made substantial progress in reasoning, systematically producing frontier-level reasoning data remains difficult. Existing synthesis methods often have limited visibility into the structural factors that govern problem difficulty, w...
Teaching Language Models to Forecast Research Success Through Comparative Idea Evaluation
arXiv:2605.21491v1 Announce Type: new
Abstract: As language models accelerate scientific research by automating hypothesis generation and implementation, a new bottleneck emerges: evaluating and filtering hundreds of AI-generated ideas without exhaustive experimentation. We ask whether LMs can lear...
Temporal Contrastive Transformer for Financial Crime Detection: Self-Supervised Sequence Embeddings via Predictive Contrastive Coding
arXiv:2605.21490v1 Announce Type: new
Abstract: We introduce the Temporal Contrastive Transformer (TCT), a representation learning framework designed to capture contextual temporal dynamics in sequences of financial transactions. The model is trained using a self-supervised contrastive objective to...
The Attribution Impossibility: No Feature Ranking Is Faithful, Stable, and Complete Under Collinearity
arXiv:2605.21492v1 Announce Type: new
Abstract: No feature ranking can be simultaneously faithful, stable, and complete when features are collinear. For collinear pairs, ranking reduces to a coin flip. We prove this impossibility, quantify it for four model classes, resolve it via ensemble averagin...
GraphDiffMed: Knowledge-Constrained Differential Attention with Pharmacological Graph Priors for Medication Recommendation
arXiv:2605.20188v1 Announce Type: new
Abstract: Recommending safe and effective medication combinations from electronic health records (EHRs) is a core clinical AI problem, yet it remains difficult because patient trajectories are long, noisy, and clinically heterogeneous. Existing methods typicall...
Neural Estimation of Pairwise Mutual Information in Masked Discrete Sequence Models
arXiv:2605.20187v1 Announce Type: new
Abstract: Understanding dependencies between variables is critical for interpretability and efficient generation in masked diffusion models (MDMs), yet these models primarily expose marginal conditional distributions and do not explicitly represent inter-variab...
Provably Learning Diffusion Models under the Manifold Hypothesis: Collapse and Refine
arXiv:2605.20235v1 Announce Type: new
Abstract: Diffusion models generate high-dimensional data with remarkable quality, yet how their training efficiently learns the score function, bypassing the curse of dimensionality when data is supported on low-dimensional manifolds, remains theoretically une...