ProMAS: Proactive Error Forecasting for Multi-Agent Systems Using Markov Transition Dynamics
arXiv:2603.20260v1 Announce Type: new
Abstract: The integration of Large Language Models into Multi-Agent Systems (MAS) has enabled the so-lution of complex, long-horizon tasks through collaborative reasoning. However, this collec-tive intelligence is inherently fragile, as a single logical fallacy...
AgenticGEO: A Self-Evolving Agentic System for Generative Engine Optimization
arXiv:2603.20213v1 Announce Type: new
Abstract: Generative search engines represent a transition from traditional ranking-based retrieval to Large Language Model (LLM)-based synthesis, transforming optimization goals from ranking prominence towards content inclusion. Generative Engine Optimization ...
Me, Myself, and $\pi$ : Evaluating and Explaining LLM Introspection
arXiv:2603.20276v1 Announce Type: new
Abstract: A hallmark of human intelligence is Introspection-the ability to assess and reason about one's own cognitive processes. Introspection has emerged as a promising but contested capability in large language models (LLMs). However, current evaluations oft...
FactorSmith: Agentic Simulation Generation via Markov Decision Process Decomposition with Planner-Designer-Critic Refinement
arXiv:2603.20270v1 Announce Type: new
Abstract: Generating executable simulations from natural language specifications remains a challenging problem due to the limited reasoning capacity of large language models (LLMs) when confronted with large, interconnected codebases. This paper presents Factor...
Domain-Specialized Tree of Thought through Plug-and-Play Predictors
arXiv:2603.20267v1 Announce Type: new
Abstract: While Large Language Models (LLMs) have advanced complex reasoning, prominent methods like the Tree of Thoughts (ToT) framework face a critical trade-off between exploration depth and computational efficiency. Existing ToT implementations often rely o...
ItinBench: Benchmarking Planning Across Multiple Cognitive Dimensions with Large Language Models
arXiv:2603.19515v1 Announce Type: new
Abstract: Large language models (LLMs) with advanced cognitive capabilities are emerging as agents for various reasoning and planning tasks. Traditional evaluations often focus on specific reasoning or planning questions within controlled environments. Recent s...
arXiv:2603.19461v1 Announce Type: new
Abstract: Self-improving AI systems aim to reduce reliance on human engineering by learning to improve their own learning and problem-solving processes. Existing approaches to self-improvement rely on fixed, handcrafted meta-level mechanisms, fundamentally limi...
Learning to Disprove: Formal Counterexample Generation with Large Language Models
arXiv:2603.19514v1 Announce Type: new
Abstract: Mathematical reasoning demands two critical, complementary skills: constructing rigorous proofs for true statements and discovering counterexamples that disprove false ones. However, current AI efforts in mathematics focus almost exclusively on proof ...
When both Grounding and not Grounding are Bad -- A Partially Grounded Encoding of Planning into SAT (Extended Version)
arXiv:2603.19429v1 Announce Type: new
Abstract: Classical planning problems are typically defined using lifted first-order representations, which offer compactness and generality. While most planners ground these representations to simplify reasoning, this can cause an exponential blowup in size. R...
arXiv:2603.19500v1 Announce Type: new
Abstract: We develop a method for producing vector sketches one part at a time. To do this, we train a multi-modal language model-based agent using a novel multi-turn process-reward reinforcement learning following supervised fine-tuning. Our approach is enable...
TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly
arXiv:2603.19296v1 Announce Type: new
Abstract: To tackle the huge computational demand of large foundation models, activation-aware compression techniques without retraining have been introduced. However, since these methods highly rely on calibration data, domain shift issues may arise for unseen...
Speculating Experts Accelerates Inference for Mixture-of-Experts
arXiv:2603.19289v1 Announce Type: new
Abstract: Mixture-of-Experts (MoE) models have gained popularity as a means of scaling the capacity of large language models (LLMs) while maintaining sparse activations and reduced per-token compute. However, in memory-constrained inference settings, expert wei...
A Visualization for Comparative Analysis of Regression Models
arXiv:2603.19291v1 Announce Type: new
Abstract: As regression is a widely studied problem, many methods have been proposed to solve it, each of them often requiring setting different hyper-parameters. Therefore, selecting the proper method for a given application may be very difficult and relies on...
Maximizing mutual information between user-contexts and responses improve LLM personalization with no additional data
arXiv:2603.19294v1 Announce Type: new
Abstract: While post-training has successfully improved large language models (LLMs) across a variety of domains, these gains heavily rely on human-labeled data or external verifiers. Existing data has already been exploited, and new high-quality data is expens...
arXiv:2603.18073v1 Announce Type: new
Abstract: Modern language model-based AI systems are remarkably powerful, yet their capabilities remain fundamentally capped by their human creators in three key ways. First, although a model's weights can be updated via fine-tuning, acquiring new knowledge fro...
Adaptive Domain Models: Bayesian Evolution, Warm Rotation, and Principled Training for Geometric and Neuromorphic AI
arXiv:2603.18104v1 Announce Type: new
Abstract: Prevailing AI training infrastructure assumes reverse-mode automatic differentiation over IEEE-754 arithmetic. The memory overhead of training relative to inference, optimizer complexity, and structural degradation of geometric properties through trai...
Multi-Trait Subspace Steering to Reveal the Dark Side of Human-AI Interaction
arXiv:2603.18085v1 Announce Type: new
Abstract: Recent incidents have highlighted alarming cases where human-AI interactions led to negative psychological outcomes, including mental health crises and even user harm. As LLMs serve as sources of guidance, emotional support, and even informal therapy,...
Don't Vibe Code, Do Skele-Code: Interactive No-Code Notebooks for Subject Matter Experts to Build Lower-Cost Agentic Workflows
arXiv:2603.18122v1 Announce Type: new
Abstract: Skele-Code is a natural-language and graph-based interface for building workflows with AI agents, designed especially for less or non-technical users. It supports incremental, interactive notebook-style development, and each step is converted to code ...
DEAF: A Benchmark for Diagnostic Evaluation of Acoustic Faithfulness in Audio Language Models
arXiv:2603.18048v1 Announce Type: new
Abstract: Recent Audio Multimodal Large Language Models (Audio MLLMs) demonstrate impressive performance on speech benchmarks, yet it remains unclear whether these models genuinely process acoustic signals or rely on text-based semantic inference. To systematic...
Taming Epilepsy: Mean Field Control of Whole-Brain Dynamics
arXiv:2603.18035v1 Announce Type: new
Abstract: Controlling the high-dimensional neural dynamics during epileptic seizures remains a significant challenge due to the nonlinear characteristics and complex connectivity of the brain. In this paper, we propose a novel framework, namely Graph-Regularize...
Engineering Verifiable Modularity in Transformers via Per-Layer Supervision
arXiv:2603.18029v1 Announce Type: new
Abstract: Transformers resist surgical control. Ablating an attention head identified as critical for capitalization produces minimal behavioral change because distributed redundancy compensates for damage. This Hydra effect renders interpretability illusory: w...
InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model
arXiv:2603.18031v1 Announce Type: new
Abstract: Balancing fine-grained local modeling with long-range dependency capture under computational constraints remains a central challenge in sequence modeling. While Transformers provide strong token mixing, they suffer from quadratic complexity, whereas M...
Frayed RoPE and Long Inputs: A Geometric Perspective
arXiv:2603.18017v1 Announce Type: new
Abstract: Rotary Positional Embedding (RoPE) is a widely adopted technique for encoding position in language models, which, while effective, causes performance breakdown when input length exceeds training length. Prior analyses assert (rightly) that long inputs...
How Clued up are LLMs? Evaluating Multi-Step Deductive Reasoning in a Text-Based Game Environment
arXiv:2603.17169v1 Announce Type: new
Abstract: Deducing whodunit proves challenging for LLM agents. In this paper, we implement a text-based multi-agent version of the classic board game Clue as a rule-based testbed for evaluating multi-step deductive reasoning, with six agents drawn from GPT-4o-m...