Toward Pre-Deployment Assurance for Enterprise AI Agents: Ontology-Grounded Simulation and Trust Certification
arXiv:2606.04037v1 Announce Type: new
Abstract: Pre-deployment verification of enterprise artificial intelligence (AI) agents remains a critical gap between large language model (LLM) capability benchmarking and production deployment. Post-deployment monitoring, human-in-the-loop controls, and prom...
Stumbling Into AI Emotional Dependence: How Routine AI Interactions Reshape Human Connection
arXiv:2606.04150v1 Announce Type: new
Abstract: Public discourse and emerging policy typically assume that AI emotional support is a deliberate act: a lonely user consciously seeking comfort from a dedicated companion chatbot. In this paper, we draw on emerging empirical evidence and argue that thi...
Position: Deployed Reinforcement Learning should be Continual
arXiv:2606.04029v1 Announce Type: new
Abstract: Reinforcement Learning (RL) has received increasing attention and adoption in real-world use cases. Most of these systems follow a train-then-fix paradigm, where trained agents do not learn while interacting with the world until performance degrades a...
Visual Graph Scaffolds for Structural Reasoning in Large Language Models
arXiv:2606.02673v1 Announce Type: new
Abstract: Graphs have been used to enhance large language models (LLMs) for structured reasoning, mostly as external knowledge sources are provided to models at test time. In this paper, we take a different view: the value of graphs for LLMs lie not only in sup...
BehaviorBench: Modeling Real-World User Decisions from Behavioral Traces
arXiv:2606.02798v1 Announce Type: new
Abstract: Many decision-support settings require systems that adapt to individual users, but evaluation data for this problem remain limited. Existing benchmarks for user understanding often rely on simulated users or model-generated behavior, even though recen...
AURA: Action-Gated Memory for Robot Policies at Constant VRAM
arXiv:2606.02775v1 Announce Type: new
Abstract: The KV-cache is the right memory for datacenters but the wrong memory for robots. Datacenter inference batches many short requests and resets them, amortizing an attention cache across a crowd. Embodied agents instead run one long, non-resetting episo...
ChatHealthAI: Aligning Electronic Health Record Representations with Large Language Models for Grounded Clinical Reasoning
arXiv:2606.02802v1 Announce Type: new
Abstract: Large language models (LLMs) exhibit strong natural-language reasoning abilities for clinical decision support, but struggle to effectively model structured longitudinal electronic health records (EHRs). In contrast, EHR foundation models can learn pr...
Evaluating Transformer and LSTM Frameworks for Prediction in Ungauged Basins
arXiv:2606.02791v1 Announce Type: new
Abstract: Watershed networks exhibit convergent topologies in which multiple tributaries merge into downstream channels,integrating diverse upstream hydrological processes. In ungauged basins, the absence of direct observations increases uncertainty and limits ...
Human-in-the-Loop Contextual Bandits for Short-Term Rental Dynamic Pricing: Structural Equivalence of Historical Warm-Up and Approval-Gated Live Learning
arXiv:2606.02595v1 Announce Type: new
Abstract: Dynamic pricing in short-term rental (STR) markets presents a distinctive challenge for online learning algorithms: pricing decisions carry significant financial risk, operators require explainability, and market feedback is sparse (one booking outcom...
From Demonstrations to Rewards: Test-Time Prompt Optimization for VLM Reward Models
arXiv:2606.00083v1 Announce Type: new
Abstract: Reinforcement learning relies on accurate reward functions, which are often hand-crafted or even unavailable in real-world applications, such as robotics. Recent work has explored the zero-shot reasoning capabilities of pre-trained Vision-Language Mod...
Hoeffding Concept Bottleneck Models with Applications to Overhead Images
arXiv:2606.00082v1 Announce Type: new
Abstract: Explainability of deep learning algorithms is critical for computer-vision applications with high-stake decisions. Concept bottleneck models (CBM) have recently shown promising performance to provide explainable and accurate predictions for classifica...
A Shared Valence Axis Across Modern LLMs and Human EEG: The Saturation Regularity
arXiv:2606.00129v1 Announce Type: new
Abstract: Large language models (LLMs) have emerged as powerful representation learners whose internal features increasingly align with human cognition. We study whether modern LLMs can serve as a lens for understanding neural representations in the human brain...
DAStatFormer: A Hybrid Multibranch Transformer with Statistical Feature Integration for DAS-Based Pattern Recognitions
arXiv:2606.00081v1 Announce Type: new
Abstract: Distributed Acoustic Sensing (DAS) enables large-scale monitoring through optical fibers, but its high dimensionality and complex spatio-temporal patterns make event classification demanding. Existing deep learning approaches-CNNs, recurrent models, a...
Deliberative Curation: A Protocol for Multi-Agent Knowledge Bases
arXiv:2606.00007v1 Announce Type: new
Abstract: As AI agents transition from isolated tools to collaborative participants in shared knowledge ecosystems, governing collective knowledge curation becomes a critical challenge. Human platform governance mechanisms do not transfer directly: agent statel...
Agents on a Tree: Pathwise Coordination for Multi-Objective Molecular Optimization
arXiv:2606.00008v1 Announce Type: new
Abstract: Multi-objective molecular optimization requires searching vast chemical spaces under conflicting objectives, where early design decisions strongly constrain downstream outcomes. Existing methods typically rely on a single policy or fixed scalarization...
Emergent Collaborative Deliberation in Multi-Model AI Systems: A BFT-Derived Protocol for Epistemic Synthesis
arXiv:2606.00005v1 Announce Type: new
Abstract: We present the Consilium Protocol, a Byzantine Fault Tolerance-derived architecture for structured multi-model AI deliberation that treats inter-model disagreement as epistemic signal rather than error. The protocol assigns engineered cognitive person...
PhyDrawGen: Physically Grounded Diagram Generation from Natural Language
arXiv:2605.30512v1 Announce Type: new
Abstract: Generating physics diagrams from text requires strict adherence to physical laws. While current generative models produce visually plausible outputs, they systematically hallucinate force vectors, ignore conservation laws, and violate geometric constr...
Unicorn: Scaling High-Dimensional Time Series Forecasting via Universal Correlation Modeling
arXiv:2605.30376v1 Announce Type: new
Abstract: Modern time series architectures face a fundamental trade-off: channel-independent models scale well with increasing data volume but ignore critical inter-channel dependencies, while channel-dependent models are expressive but remain ``dimension-bound...
Gait2Hip-60: A Unified Deep Learning Benchmark for Predicting Hip Muscle Forces and Joint Moments from Multi-Cadence Gait Kinematics
arXiv:2605.30374v1 Announce Type: new
Abstract: Estimating hip muscle forces and joint moments during gait typically relies on musculoskeletal simulation, which is informative but time-consuming and difficult to apply in clinical settings. This study developed a deep learning framework to predict t...
QASM-Eval: A Dataset to Train and Evaluate LLMs on OpenQASM-3 Beyond Quantum Circuits
arXiv:2605.30358v1 Announce Type: new
Abstract: Quantum computing remains in the Noisy Intermediate-Scale Quantum (NISQ) era, where the performance is highly constrained to noise. Addressing the limitation often requires hardware-facing capabilities beyond gate-sequence circuit specification, inclu...
Physically Viable World Models: A Case for Query-Conditioned Embodied AI
arXiv:2605.30542v1 Announce Type: new
Abstract: World models for embodied AI must be physically viable: constructed to answer intervention queries by representing the physical structure governing action outcomes, rather than merely predicting future observations. Existing observation-predictive wor...
Uncertainty-Aware and Temporally Regulated Expert Advice in Reinforcement Learning for Autonomous Driving
arXiv:2605.30576v1 Announce Type: new
Abstract: Exploration in reinforcement learning for autonomous driving is inherently unsafe: agents must experience novel behaviors to learn, yet exploration can lead to collisions or off-road driving. We propose an uncertainty-aware framework that leverages ex...
Molecular Lead Optimization via Agentic Tool Planning
arXiv:2605.28862v1 Announce Type: new
Abstract: Drug discovery is a lengthy and resource-intensive process composed of multiple stages. Among these stages, lead optimization plays a critical role in transforming early hit compounds into viable drug candidates. This stage requires improving ADMET-re...
Self-Play Reinforcement Learning under Imperfect Information in Big 2
arXiv:2605.28863v1 Announce Type: new
Abstract: Imperfect-information multiplayer games test whether agents can act under hidden information, sparse rewards, and non-stationary opponents. We study these challenges in Big 2, a four-player imperfect-information card game. We develop a self-play RL fr...