AWARE-US: Benchmark for Preference-Aware Resolution in Tool-Calling Agents
arXiv:2601.02643v1 Announce Type: new
Abstract: Tool-calling conversational agents querying structured databases often face two linked failures: underspecification (missing constraints needed to run a precise query) and infeasibility (the fully specified query returns an empty set because no item s...
SimpleMem: Efficient Lifelong Memory for LLM Agents
arXiv:2601.02553v1 Announce Type: new
Abstract: To support reliable long-term interaction in complex environments, LLM agents require memory systems that efficiently manage historical experiences. Existing approaches either retain full interaction histories via passive context extension, leading to...
Textual Explanations and Their Evaluations for Reinforcement Learning Policy
arXiv:2601.02514v1 Announce Type: new
Abstract: Understanding a Reinforcement Learning (RL) policy is crucial for ensuring that autonomous agents behave according to human expectations. This goal can be achieved using Explainable Reinforcement Learning (XRL) techniques. Although textual explanation...
Intrinsic-Metric Physics-Informed Neural Networks (IM-PINN) for Reaction-Diffusion Dynamics on Complex Riemannian Manifolds
arXiv:2601.00834v1 Announce Type: new
Abstract: Simulating nonlinear reaction-diffusion dynamics on complex, non-Euclidean manifolds remains a fundamental challenge in computational morphogenesis, constrained by high-fidelity mesh generation costs and symplectic drift in discrete time-stepping sche...
ShrimpXNet: A Transfer Learning Framework for Shrimp Disease Classification with Augmented Regularization, Adversarial Training, and Explainable AI
arXiv:2601.00832v1 Announce Type: new
Abstract: Shrimp is one of the most widely consumed aquatic species globally, valued for both its nutritional content and economic importance. Shrimp farming represents a significant source of income in many regions; however, like other forms of aquaculture, it...
Agentic AI for Autonomous, Explainable, and Real-Time Credit Risk Decision-Making
arXiv:2601.00818v1 Announce Type: new
Abstract: Significant digitalization of financial services in a short period of time has led to an urgent demand to have autonomous, transparent and real-time credit risk decision making systems. The traditional machine learning models are effective in pattern ...
CogCanvas: Compression-Resistant Cognitive Artifacts for Long LLM Conversations
arXiv:2601.00821v1 Announce Type: new
Abstract: Large language models face a fundamental tension between context window limits and information fidelity in long conversations. Existing approaches--truncation and summarization--either discard early information or lose nuanced details. We introduce Co...
arXiv:2601.00823v1 Announce Type: new
Abstract: Large reasoning models (LRMs) have heterogeneous inference energy costs based on which model is used and how much it reasons. To reduce energy, it is important to choose the right LRM and operate it in the right way. As a result, the performance of sy...
arXiv:2601.00084v1 Announce Type: new
Abstract: In fixed-confidence best arm identification (BAI), the objective is to quickly identify the optimal option while controlling the probability of error below a desired threshold. Despite the plethora of BAI algorithms, existing methods typically fall sh...
The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition
arXiv:2601.00065v1 Announce Type: new
Abstract: The open-weight LLM ecosystem is increasingly defined by model composition techniques (such as weight merging, speculative decoding, and vocabulary expansion) that remix capabilities from diverse sources. A critical prerequisite for applying these met...
Quantitative Rule-Based Strategy modeling in Classic Indian Rummy: A Metric Optimization Approach
arXiv:2601.00024v1 Announce Type: new
Abstract: The 13-card variant of Classic Indian Rummy is a sequential game of incomplete information that requires probabilistic reasoning and combinatorial decision-making. This paper proposes a rule-based framework for strategic play, driven by a new hand-eva...
arXiv:2601.00021v1 Announce Type: new
Abstract: We present a physical theory of intelligence grounded in irreversible information processing in systems constrained by conservation laws. An intelligent system is modelled as a coupled agent-environment process whose evolution transforms information i...
Reasoning in Action: MCTS-Driven Knowledge Retrieval for Large Language Models
arXiv:2601.00003v1 Announce Type: new
Abstract: Large language models (LLMs) typically enhance their performance through either the retrieval of semantically similar information or the improvement of their reasoning capabilities. However, a significant challenge remains in effectively integrating b...
A multi-algorithm approach for operational human resources workload balancing in a last mile urban delivery system
arXiv:2601.00023v1 Announce Type: new
Abstract: Efficient workload assignment to the workforce is critical in last-mile package delivery systems. In this context, traditional methods of assigning package deliveries to workers based on geographical proximity can be inefficient and surely guide to an...
A Proof-of-Concept for Explainable Disease Diagnosis Using Large Language Models and Answer Set Programming
arXiv:2512.23932v1 Announce Type: new
Abstract: Accurate disease prediction is vital for timely intervention, effective treatment, and reducing medical complications. While symbolic AI has been applied in healthcare, its adoption remains limited due to the effort required for constructing high-qual...
SPARK: Search Personalization via Agent-Driven Retrieval and Knowledge-sharing
arXiv:2512.24008v1 Announce Type: new
Abstract: Personalized search demands the ability to model users' evolving, multi-dimensional information needs; a challenge for systems constrained by static profiles or monolithic retrieval pipelines. We present SPARK (Search Personalization via Agent-Driven ...
ROAD: Reflective Optimization via Automated Debugging for Zero-Shot Agent Alignment
arXiv:2512.24040v1 Announce Type: new
Abstract: Automatic Prompt Optimization (APO) has emerged as a critical technique for enhancing Large Language Model (LLM) performance, yet current state-of-the-art methods typically rely on large, labeled gold-standard development sets to compute fitness score...
The Drill-Down and Fabricate Test (DDFT): A Protocol for Measuring Epistemic Robustness in Language Models
arXiv:2512.23850v1 Announce Type: new
Abstract: Current language model evaluations measure what models know under ideal conditions but not how robustly they know it under realistic stress. Static benchmarks like MMLU and TruthfulQA cannot distinguish a model that lacks knowledge from one whose veri...
CASCADE: Cumulative Agentic Skill Creation through Autonomous Development and Evolution
arXiv:2512.23880v1 Announce Type: new
Abstract: Large language model (LLM) agents currently depend on predefined tools or brittle tool generation, constraining their capability and adaptability to complex scientific tasks. We introduce CASCADE, a self-evolving agentic framework representing an earl...
arXiv:2512.23752v1 Announce Type: new
Abstract: Recent work has shown that small transformers trained in controlled "wind-tunnel'' settings can implement exact Bayesian inference, and that their training dynamics produce a geometric substrate -- low-dimensional value manifolds and progressively ort...
Coordinate Matrix Machine: A Human-level Concept Learning to Classify Very Similar Documents
arXiv:2512.23749v1 Announce Type: new
Abstract: Human-level concept learning argues that humans typically learn new concepts from a single example, whereas machine learning algorithms typically require hundreds of samples to learn a single concept. Our brain subconsciously identifies important feat...
Pruning Graphs by Adversarial Robustness Evaluation to Strengthen GNN Defenses
arXiv:2512.22128v1 Announce Type: new
Abstract: Graph Neural Networks (GNNs) have emerged as a dominant paradigm for learning on graph-structured data, thanks to their ability to jointly exploit node features and relational information encoded in the graph topology. This joint modeling, however, al...
Wireless Traffic Prediction with Large Language Model
arXiv:2512.22178v1 Announce Type: new
Abstract: The growing demand for intelligent, adaptive resource management in next-generation wireless networks has underscored the importance of accurate and scalable wireless traffic prediction. While recent advancements in deep learning and foundation models...
Latent Sculpting for Zero-Shot Generalization: A Manifold Learning Approach to Out-of-Distribution Anomaly Detection
arXiv:2512.22179v1 Announce Type: new
Abstract: A fundamental limitation of supervised deep learning in high-dimensional tabular domains is "Generalization Collapse": models learn precise decision boundaries for known distributions but fail catastrophically when facing Out-of-Distribution (OOD) dat...