Digital Red Queen: Adversarial Program Evolution in Core War with LLMs
arXiv:2601.03335v1 Announce Type: new
Abstract: Large language models (LLMs) are increasingly being used to evolve solutions to problems in many domains, in a process inspired by biological evolution. However, unlike biological evolution, most LLM-evolution frameworks are formulated as static optim...
Exploration Through Introspection: A Self-Aware Reward Model
arXiv:2601.03389v1 Announce Type: new
Abstract: Understanding how artificial agents model internal mental states is central to advancing Theory of Mind in AI. Evidence points to a unified system for self- and other-awareness. We explore this self-awareness by having reinforcement learning agents in...
Enhancing LLM Instruction Following: An Evaluation-Driven Multi-Agentic Workflow for Prompt Instructions Optimization
arXiv:2601.03359v1 Announce Type: new
Abstract: Large Language Models (LLMs) often generate substantively relevant content but fail to adhere to formal constraints, leading to outputs that are conceptually correct but procedurally flawed. Traditional prompt refinement approaches focus on rephrasing...
Polynomial Convergence of Riemannian Diffusion Models
arXiv:2601.02499v1 Announce Type: new
Abstract: Diffusion models have demonstrated remarkable empirical success in the recent years and are considered one of the state-of-the-art generative models in modern AI. These models consist of a forward process, which gradually diffuses the data distributio...
arXiv:2601.02433v1 Announce Type: new
Abstract: Digital AI systems spanning large language models, vision models, and generative architectures that operate primarily in symbolic, linguistic, or pixel domains. They have achieved striking progress, but almost all of this progress lives in virtual spa...
WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks
arXiv:2601.02439v1 Announce Type: new
Abstract: We present WebGym, the largest-to-date open-source environment for training realistic visual web agents. Real websites are non-stationary and diverse, making artificial or small-scale task sets insufficient for robust policy learning. WebGym contains ...
GEM-Style Constraints for PEFT with Dual Gradient Projection in LoRA
arXiv:2601.02500v1 Announce Type: new
Abstract: Full fine-tuning of Large Language Models (LLMs) is computationally costly, motivating Continual Learning (CL) approaches that utilize parameter-efficient adapters. We revisit Gradient Episodic Memory (GEM) within the Low-Rank Adapter (LoRA) subspace ...
Orchestral AI: A Framework for Agent Orchestration
arXiv:2601.02577v1 Announce Type: new
Abstract: The rapid proliferation of LLM agent frameworks has forced developers to choose between vendor lock-in through provider-specific SDKs and complex multi-package ecosystems that obscure control flow and hinder reproducibility. Integrating tool calling a...
AWARE-US: Benchmark for Preference-Aware Resolution in Tool-Calling Agents
arXiv:2601.02643v1 Announce Type: new
Abstract: Tool-calling conversational agents querying structured databases often face two linked failures: underspecification (missing constraints needed to run a precise query) and infeasibility (the fully specified query returns an empty set because no item s...
SimpleMem: Efficient Lifelong Memory for LLM Agents
arXiv:2601.02553v1 Announce Type: new
Abstract: To support reliable long-term interaction in complex environments, LLM agents require memory systems that efficiently manage historical experiences. Existing approaches either retain full interaction histories via passive context extension, leading to...
Textual Explanations and Their Evaluations for Reinforcement Learning Policy
arXiv:2601.02514v1 Announce Type: new
Abstract: Understanding a Reinforcement Learning (RL) policy is crucial for ensuring that autonomous agents behave according to human expectations. This goal can be achieved using Explainable Reinforcement Learning (XRL) techniques. Although textual explanation...
Intrinsic-Metric Physics-Informed Neural Networks (IM-PINN) for Reaction-Diffusion Dynamics on Complex Riemannian Manifolds
arXiv:2601.00834v1 Announce Type: new
Abstract: Simulating nonlinear reaction-diffusion dynamics on complex, non-Euclidean manifolds remains a fundamental challenge in computational morphogenesis, constrained by high-fidelity mesh generation costs and symplectic drift in discrete time-stepping sche...
ShrimpXNet: A Transfer Learning Framework for Shrimp Disease Classification with Augmented Regularization, Adversarial Training, and Explainable AI
arXiv:2601.00832v1 Announce Type: new
Abstract: Shrimp is one of the most widely consumed aquatic species globally, valued for both its nutritional content and economic importance. Shrimp farming represents a significant source of income in many regions; however, like other forms of aquaculture, it...
Agentic AI for Autonomous, Explainable, and Real-Time Credit Risk Decision-Making
arXiv:2601.00818v1 Announce Type: new
Abstract: Significant digitalization of financial services in a short period of time has led to an urgent demand to have autonomous, transparent and real-time credit risk decision making systems. The traditional machine learning models are effective in pattern ...
CogCanvas: Compression-Resistant Cognitive Artifacts for Long LLM Conversations
arXiv:2601.00821v1 Announce Type: new
Abstract: Large language models face a fundamental tension between context window limits and information fidelity in long conversations. Existing approaches--truncation and summarization--either discard early information or lose nuanced details. We introduce Co...
arXiv:2601.00823v1 Announce Type: new
Abstract: Large reasoning models (LRMs) have heterogeneous inference energy costs based on which model is used and how much it reasons. To reduce energy, it is important to choose the right LRM and operate it in the right way. As a result, the performance of sy...
arXiv:2601.00084v1 Announce Type: new
Abstract: In fixed-confidence best arm identification (BAI), the objective is to quickly identify the optimal option while controlling the probability of error below a desired threshold. Despite the plethora of BAI algorithms, existing methods typically fall sh...
The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition
arXiv:2601.00065v1 Announce Type: new
Abstract: The open-weight LLM ecosystem is increasingly defined by model composition techniques (such as weight merging, speculative decoding, and vocabulary expansion) that remix capabilities from diverse sources. A critical prerequisite for applying these met...
Quantitative Rule-Based Strategy modeling in Classic Indian Rummy: A Metric Optimization Approach
arXiv:2601.00024v1 Announce Type: new
Abstract: The 13-card variant of Classic Indian Rummy is a sequential game of incomplete information that requires probabilistic reasoning and combinatorial decision-making. This paper proposes a rule-based framework for strategic play, driven by a new hand-eva...
arXiv:2601.00021v1 Announce Type: new
Abstract: We present a physical theory of intelligence grounded in irreversible information processing in systems constrained by conservation laws. An intelligent system is modelled as a coupled agent-environment process whose evolution transforms information i...
Reasoning in Action: MCTS-Driven Knowledge Retrieval for Large Language Models
arXiv:2601.00003v1 Announce Type: new
Abstract: Large language models (LLMs) typically enhance their performance through either the retrieval of semantically similar information or the improvement of their reasoning capabilities. However, a significant challenge remains in effectively integrating b...
A multi-algorithm approach for operational human resources workload balancing in a last mile urban delivery system
arXiv:2601.00023v1 Announce Type: new
Abstract: Efficient workload assignment to the workforce is critical in last-mile package delivery systems. In this context, traditional methods of assigning package deliveries to workers based on geographical proximity can be inefficient and surely guide to an...
A Proof-of-Concept for Explainable Disease Diagnosis Using Large Language Models and Answer Set Programming
arXiv:2512.23932v1 Announce Type: new
Abstract: Accurate disease prediction is vital for timely intervention, effective treatment, and reducing medical complications. While symbolic AI has been applied in healthcare, its adoption remains limited due to the effort required for constructing high-qual...
SPARK: Search Personalization via Agent-Driven Retrieval and Knowledge-sharing
arXiv:2512.24008v1 Announce Type: new
Abstract: Personalized search demands the ability to model users' evolving, multi-dimensional information needs; a challenge for systems constrained by static profiles or monolithic retrieval pipelines. We present SPARK (Search Personalization via Agent-Driven ...