Fair outputs, Biased Internals: Causal Potency and Asymmetry of Latent Bias in LLMs for High-Stakes Decisions
arXiv:2605.15217v1 Announce Type: new
Abstract: Instruction-tuned language models exhibit behavioural fairness in high-stakes decisions while retaining biased associations in their internal representations. However, whether these suppressed representations can affect model outputs - and whether suc...
Does Theory of Mind Improvement Really Benefit Human-AI Interactions? Empirical Findings from Interactive Evaluations
arXiv:2605.15205v1 Announce Type: new
Abstract: Improving the Theory of Mind (ToM) capability of Large Language Models (LLMs) is crucial for effective social interactions between these AI models and humans. However, the existing benchmarks often measure ToM capability improvement through story-read...
SDOF: Taming the Alignment Tax in Multi-Agent Orchestration with State-Constrained Dispatch
arXiv:2605.15204v1 Announce Type: new
Abstract: Multi-agent orchestration frameworks such as LangChain, LangGraph, and CrewAI route tasks through graph-based pipelines but do not enforce the stage constraints that govern real business processes. We present SDOF, a framework that treats multi-agent ...
DeepSlide: From Artifacts to Presentation Delivery
arXiv:2605.15202v1 Announce Type: new
Abstract: Presentations are a primary medium for scholarly communication, yet most AI slide generators optimize the artifact (a visually plausible deck) while under-optimizing the delivery process (pacing, narrative, and presentation preparation). We present De...
Mask-Morph Graph U-Net: A Generalisable Mesh-Based Surrogate for Crashworthiness Field Prediction under Large Geometric Variation
arXiv:2605.15231v1 Announce Type: new
Abstract: Nonlinear finite element crash simulations are accurate but computationally expensive, limiting their use in iterative design optimisation. Machine-learning surrogate models based on graph neural networks (GNNs) offer a faster alternative. Message-pas...
TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination
arXiv:2605.15207v1 Announce Type: new
Abstract: Multi-agent LLM systems have shown promise for complex reasoning, yet recent evaluations reveal they often underperform single-model baselines. We identify a structural failure mode in sequential fine-tuning of shared-context teams: updating one agent...
AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices
arXiv:2605.15206v1 Announce Type: new
Abstract: Autonomous agents powered by large language models (LLMs) are increasingly used to automate complex, multi-step tasks such as coding or web-based question answering. While remote, cloud-based agents offer scalability and ease of deployment, they raise...
MuteBench: Modality Unavailability Tolerance Evaluation for Incomplete Multimodal Fusion
arXiv:2605.15235v1 Announce Type: new
Abstract: Multimodal physiological data powers clinical AI systems from intensive care units to wearable devices, but sensors routinely fail in practice. Two failure modes are common: modality missing, where an entire channel is absent, and within-modality miss...
Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels
arXiv:2605.15208v1 Announce Type: new
Abstract: Large Language Models are routinely compressed via post-training quantization to reduce inference costs and memory footprint for cloud and edge deployment, yet the impact of this compression on model quality remains poorly understood. Existing studies...
A Two-Dimensional Framework for AI Agent Design Patterns: Cognitive Function and Execution Topology
arXiv:2605.13850v1 Announce Type: new
Abstract: Existing frameworks for LLM-based agent architectures describe systems from a single perspective: industry guides (Anthropic, Google, LangChain) focus on execution topology -- how data flows -- while cognitive science surveys focus on cognitive functi...
Mixed Integer Goal Programming for Personalized Meal Optimization with User-Defined Serving Granularity
arXiv:2605.13849v1 Announce Type: new
Abstract: Determining what to eat to satisfy nutritional requirements is one of the oldest optimization problems in operations research, yet existing formulations have two persistent limitations: continuous variables produce impractical fractional servings (1.7...
GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration
arXiv:2605.13848v1 Announce Type: new
Abstract: Agentic LLM frameworks that rely on prompted orchestration, where the model itself determines workflow transitions, often suffer from hallucinated routing, infinite loops, and non-reproducible execution. We introduce GraphBit, an engine-orchestrated f...
Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems
arXiv:2605.13851v1 Announce Type: new
Abstract: Multi-agent orchestration -- in which a hidden coordinator manages specialized worker agents -- is becoming the default architecture for enterprise AI deployment, yet the safety implications of orchestrator invisibility have never been empirically tes...
arXiv:2605.13880v1 Announce Type: new
Abstract: Agent memory is typically constructed either offline from curated demonstrations or online from post-deployment interactions. However, regardless of how it is built, an agent faces a cold-start gap when first introduced to a new environment without an...
Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models
arXiv:2605.13935v1 Announce Type: new
Abstract: Diffusion language models are a promising alternative to autoregressive models, yet post-training methods for them largely adapt reward-maximizing objectives. We identify a central failure mode in this setting we call trajectory locking: sampled rewar...
Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders
arXiv:2605.13930v1 Announce Type: new
Abstract: EEG foundation models achieve state-of-the-art clinical performance, yet the internal computations driving their predictions remain opaque: a barrier to clinical trust. We apply TopK Sparse Autoencoders (SAEs) across three architecturally distinct EEG...
Unsupervised learning of acquisition variability in structural connectomes via hybrid latent space modeling
arXiv:2605.13933v1 Announce Type: new
Abstract: Acquisition differences across sites, scanners, and protocols in dMRI introduce variability that complicates structural connectome analysis. This motivates deep learning models that can represent high-dimensional connectomes in a low-dimensional space...
Rethinking Molecular OOD Generalization via Target-Aware Source Selection
arXiv:2605.13932v1 Announce Type: new
Abstract: Robust prediction of molecular properties under extreme out-of-distribution (OOD) scenarios is a pivotal bottleneck in AI-driven drug discovery. Current scaffold-splitting protocols fail to obstruct microscopic semantic overlap, predisposing models to...
arXiv:2605.12674v1 Announce Type: new
Abstract: Vision-Language Models (VLMs) are increasingly used in safety-critical applications because of their broad reasoning capabilities and ability to generalize with minimal task-specific engineering. Despite these advantages, they can exhibit catastrophic...
Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack
arXiv:2605.12673v1 Announce Type: new
Abstract: Agent benchmarks have become the de facto measure of frontier AI competence, guiding model selection, investment, and deployment. However, reward hacking, where agents maximize a score without performing the intended task, emerges spontaneously in fro...
Learning Transferable Latent User Preferences for Human-Aligned Decision Making
arXiv:2605.12682v1 Announce Type: new
Abstract: Large language models (LLMs) are increasingly used as reasoning modules in many applications. While they are efficient in certain tasks, LLMs often struggle to produce human-aligned solutions. Human-aligned decision making requires accounting for both...
Macro-Action Based Multi-Agent Instruction Following through Value Cancellation
arXiv:2605.12655v1 Announce Type: new
Abstract: Multi-agent reinforcement learning (MARL) in real-world use cases may need to adapt to external natural language instructions that interrupt ongoing behavior and conflict with long-horizon objectives. However, conditioning rewards on instructions intr...
Learning to Decide with AI Assistance under Human-Alignment
arXiv:2605.12646v1 Announce Type: new
Abstract: It is widely agreed that when AI models assist decision-makers in high-stakes domains by predicting an outcome of interest, they should communicate the confidence of their predictions. However, empirical evidence suggests that decision-makers often st...
OceanCBM: A Concept Bottleneck Model for Mechanistic Interpretability in Ocean Forecasting
arXiv:2605.12639v1 Announce Type: new
Abstract: Extreme ocean phenomena are challenging not only to predict but to diagnose, as accurate forecasts alone do not reveal the underlying physical drivers. While recent machine learning approaches achieve strong predictive skill, they remain largely opaqu...