AutoSAM: an Agentic Framework for Automating Input File Generation for the SAM Code with Multi-Modal Retrieval-Augmented Generation
arXiv:2603.24736v1 Announce Type: new
Abstract: In the design and safety analysis of advanced reactor systems, constructing input files for system-level thermal-hydraulics codes such as the System Analysis Module (SAM) remains a labor-intensive task. Analysts must extract and reconcile design data ...
Trust as Monitoring: Evolutionary Dynamics of User Trust and AI Developer Behaviour
arXiv:2603.24742v1 Announce Type: new
Abstract: AI safety is an increasingly urgent concern as the capabilities and adoption of AI systems grow. Existing evolutionary models of AI governance have primarily examined incentives for safe development and effective regulation, typically representing use...
Formal Semantics for Agentic Tool Protocols: A Process Calculus Approach
arXiv:2603.24747v1 Announce Type: new
Abstract: The emergence of large language model agents capable of invoking external tools has created urgent need for formal verification of agent protocols. Two paradigms dominate this space: Schema-Guided Dialogue (SGD), a research framework for zero-shot API...
arXiv:2603.23539v1 Announce Type: new
Abstract: We show that PLDR-LLMs pretrained at self-organized criticality exhibit reasoning at inference time. The characteristics of PLDR-LLM deductive outputs at criticality is similar to second-order phase transitions. At criticality, the correlation length ...
Evaluating a Multi-Agent Voice-Enabled Smart Speaker for Care Homes: A Safety-Focused Framework
arXiv:2603.23625v1 Announce Type: new
Abstract: Artificial intelligence (AI) is increasingly being explored in health and social care to reduce administrative workload and allow staff to spend more time on patient care. This paper evaluates a voice-enabled Care Home Smart Speaker designed to suppor...
arXiv:2603.23660v1 Announce Type: new
Abstract: We introduce GTO Wizard Benchmark, a public API and standardized evaluation framework for benchmarking algorithms in Heads-Up No-Limit Texas Hold'em (HUNL). The benchmark evaluates agents against GTO Wizard AI, a state-of-the-art superhuman poker agen...
Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments
arXiv:2603.23638v1 Announce Type: new
Abstract: Large language models (LLMs) have enabled agentic systems that can reason, plan, and act across complex tasks, but it remains unclear whether they can allocate resources effectively under uncertainty. Unlike short-horizon reactive decisions, allocatio...
Environment Maps: Structured Environmental Representations for Long-Horizon Agents
arXiv:2603.23610v2 Announce Type: new
Abstract: Although large language models (LLMs) have advanced rapidly, robust automation of complex software workflows remains an open problem. In long-horizon settings, agents frequently suffer from cascading errors and environmental stochasticity; a single mi...
arXiv:2603.23558v1 Announce Type: new
Abstract: Uncertainty quantification is a key aspect in many tasks such as model selection/regularization, or quantifying prediction uncertainties to perform active learning or OOD detection. Within credal approaches that consider modeling uncertainty as probab...
Implicit Turn-Wise Policy Optimization for Proactive User-LLM Interaction
arXiv:2603.23550v1 Announce Type: new
Abstract: Multi-turn human-AI collaboration is fundamental to deploying interactive services such as adaptive tutoring, conversational recommendation, and professional consultation. However, optimizing these interactions via reinforcement learning is hindered b...
Beyond Accuracy: Introducing a Symbolic-Mechanistic Approach to Interpretable Evaluation
arXiv:2603.23517v1 Announce Type: new
Abstract: Accuracy-based evaluation cannot reliably distinguish genuine generalization from shortcuts like memorization, leakage, or brittle heuristics, especially in small-data regimes. In this position paper, we argue for mechanism-aware evaluation that combi...
arXiv:2603.23562v1 Announce Type: new
Abstract: Synthetic data augmentation helps language models learn new knowledge in data-constrained domains. However, naively scaling existing synthetic data methods by training on more synthetic tokens or using stronger generators yields diminishing returns be...
arXiv:2603.22350v1 Announce Type: new
Abstract: Deterministic pre-execution safety gates evaluate whether individual agent actions are compatible with their assigned roles. While effective at per-action authorization, these systems are structurally blind to distributed attacks that decompose harmfu...
Intelligence Inertia: Physical Principles and Applications
arXiv:2603.22347v1 Announce Type: new
Abstract: While Landauer's principle establishes the fundamental thermodynamic floor for information erasure and Fisher Information provides a metric for local curvature in parameter space, these classical frameworks function effectively only as approximations ...
The Efficiency Attenuation Phenomenon: A Computational Challenge to the Language of Thought Hypothesis
arXiv:2603.22312v1 Announce Type: new
Abstract: This paper computationally investigates whether thought requires a language-like format, as posited by the Language of Thought (LoT) hypothesis. We introduce the ``AI Private Language'' thought experiment: if two artificial agents develop an efficient...
Beyond Hard Constraints: Budget-Conditioned Reachability For Safe Offline Reinforcement Learning
arXiv:2603.22292v1 Announce Type: new
Abstract: Sequential decision making using Markov Decision Process underpins many realworld applications. Both model-based and model free methods have achieved strong results in these settings. However, real-world tasks must balance reward maximization with saf...
Latent Semantic Manifolds in Large Language Models
arXiv:2603.22301v1 Announce Type: new
Abstract: Large Language Models (LLMs) perform internal computations in continuous vector spaces yet produce discrete tokens -- a fundamental mismatch whose geometric consequences remain poorly understood. We develop a mathematical framework that interprets LLM...
Between the Layers Lies the Truth: Uncertainty Estimation in LLMs Using Intra-Layer Local Information Scores
arXiv:2603.22299v1 Announce Type: new
Abstract: Large language models (LLMs) are often confidently wrong, making reliable uncertainty estimation (UE) essential. Output-based heuristics are cheap but brittle, while probing internal representations is effective yet high-dimensional and hard to transf...
arXiv:2603.22300v1 Announce Type: new
Abstract: Scaling Transformers to ultra-long contexts is bottlenecked by the $O(n^2 d)$ cost of self-attention. Existing methods reduce this cost along the sequence axis through local windows, kernel approximations, or token-level sparsity, but these approaches...
Efficient Embedding-based Synthetic Data Generation for Complex Reasoning Tasks
arXiv:2603.22294v1 Announce Type: new
Abstract: Synthetic Data Generation (SDG), leveraging Large Language Models (LLMs), has recently been recognized and broadly adopted as an effective approach to improve the performance of smaller but more resource and compute efficient LLMs through fine-tuning....
JointFM-0.1: A Foundation Model for Multi-Target Joint Distributional Prediction
arXiv:2603.20266v1 Announce Type: new
Abstract: Despite the rapid advancements in Artificial Intelligence (AI), Stochastic Differential Equations (SDEs) remain the gold-standard formalism for modeling systems under uncertainty. However, applying SDEs in practice is fraught with challenges: modeling...
Rolling-Origin Validation Reverses Model Rankings in Multi-Step PM10 Forecasting: XGBoost, SARIMA, and Persistence
arXiv:2603.20315v1 Announce Type: new
Abstract: (a) Many air quality forecasting studies report gains from machine learning, but evaluations often use static chronological splits and omit persistence baselines, so the operational added value under routine updating is unclear.
(b) Using 2,350 dail...
MARLIN: Multi-Agent Reinforcement Learning for Incremental DAG Discovery
arXiv:2603.20295v1 Announce Type: new
Abstract: Uncovering causal structures from observational data is crucial for understanding complex systems and making informed decisions. While reinforcement learning (RL) has shown promise in identifying these structures in the form of a directed acyclic grap...
Collaborative Adaptive Curriculum for Progressive Knowledge Distillation
arXiv:2603.20296v1 Announce Type: new
Abstract: Recent advances in collaborative knowledge distillation have demonstrated cutting-edge performance for resource-constrained distributed multimedia learning scenarios. However, achieving such competitiveness requires addressing a fundamental mismatch: ...