Implicit Intelligence -- Evaluating Agents on What Users Don't Say
arXiv:2602.20424v1 Announce Type: new
Abstract: Real-world requests to AI agents are fundamentally underspecified. Natural human communication relies on shared context and unstated constraints that speakers expect listeners to infer. Current agentic benchmarks test explicit instruction-following bu...
Diffusion Modulation via Environment Mechanism Modeling for Planning
arXiv:2602.20422v1 Announce Type: new
Abstract: Diffusion models have shown promising capabilities in trajectory generation for planning in offline reinforcement learning (RL). However, conventional diffusion-based planning methods often fail to account for the fact that generating trajectories in ...
DMCD: Semantic-Statistical Framework for Causal Discovery
arXiv:2602.20333v1 Announce Type: new
Abstract: We present DMCD (DataMap Causal Discovery), a two-phase causal discovery framework that integrates LLM-based semantic drafting from variable metadata with statistical validation on observational data. In Phase I, a large language model proposes a spar...
An artificial intelligence framework for end-to-end rare disease phenotyping from clinical notes using large language models
arXiv:2602.20324v1 Announce Type: new
Abstract: Phenotyping is fundamental to rare disease diagnosis, but manual curation of structured phenotypes from clinical notes is labor-intensive and difficult to scale. Existing artificial intelligence approaches typically optimize individual components of p...
IMOVNO+: A Regional Partitioning and Meta-Heuristic Ensemble Framework for Imbalanced Multi-Class Learning
arXiv:2602.20199v1 Announce Type: new
Abstract: Class imbalance, overlap, and noise degrade data quality, reduce model reliability, and limit generalization. Although widely studied in binary classification, these issues remain underexplored in multi-class settings, where complex inter-class relati...
Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning
arXiv:2602.20197v1 Announce Type: new
Abstract: Reinforcement Learning with verifiable rewards (RLVR) has emerged as a primary learning paradigm for enhancing the reasoning capabilities of multi-modal large language models (MLLMs). However, during RL training, the enormous state space of MLLM and s...
FedAvg-Based CTMC Hazard Model for Federated Bridge Deterioration Assessment
arXiv:2602.20194v1 Announce Type: new
Abstract: Bridge periodic inspection records contain sensitive information about public infrastructure, making cross-organizational data sharing impractical under existing data governance constraints. We propose a federated framework for estimating a Continuous...
MoBiQuant: Mixture-of-Bits Quantization for Token-Adaptive Elastic LLMs
arXiv:2602.20191v1 Announce Type: new
Abstract: Changing runtime complexity on cloud and edge devices necessitates elastic large language model (LLM) deployment, where an LLM can be inferred with various quantization precisions based on available computational resources. However, it has been observ...
Tensor Network Generator-Enhanced Optimization for Traveling Salesman Problem
arXiv:2602.20175v1 Announce Type: new
Abstract: We present an application of the tensor network generator-enhanced optimization (TN-GEO) framework to address the traveling salesman problem (TSP), a fundamental combinatorial optimization challenge. Our approach employs a tensor network Born machine ...
Reusing Pre-Training Data at Test Time is a Compute Multiplier
Large language models learn from their vast pre-training corpora, gaining the ability to solve an ever increasing variety of tasks; yet although researchers work to improve these datasets, there is little effort to understand how efficient the pre-training apparatus is at extracting ideas and knowle...
Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates
Prior studies investigating the internal workings of LLMs have uncovered sparse subnetworks, often referred to as circuits, that are responsible for performing specific tasks. Additionally, it has been shown that model performance improvement through fine-tuning often results from the strengthening ...
A.R.I.S.: Automated Recycling Identification System for E-Waste Classification Using Deep Learning
Traditional electronic recycling processes suffer from significant resource loss due to inadequate material separation and identification capabilities, limiting material recovery. We present A.R.I.S. (Automated Recycling Identification System), a low-cost, portable sorter for shredded e-waste that a...
Closing the Gap Between Text and Speech Understanding in LLMs
Large Language Models (LLMs) can be adapted to extend their text capabilities to speech inputs. However, these speech-adapted LLMs consistently underperform their text-based counterparts—and even cascaded pipelines—on language understanding tasks. We term this shortfall the text-speech understanding...
arXiv:2602.18671v1 Announce Type: new
Abstract: We reinterpret the final Large Language Model (LLM) softmax classifier as an Energy-Based Model (EBM), decomposing the sequence-to-sequence probability chain into multiple interacting EBMs at inference. This principled approach allows us to track "ene...
Decoding ML Decision: An Agentic Reasoning Framework for Large-Scale Ranking System
arXiv:2602.18640v1 Announce Type: new
Abstract: Modern large-scale ranking systems operate within a sophisticated landscape of competing objectives, operational constraints, and evolving product requirements. Progress in this domain is increasingly bottlenecked by the engineering context constraint...
Feedback-based Automated Verification in Vibe Coding of CAS Adaptation Built on Constraint Logic
arXiv:2602.18607v1 Announce Type: new
Abstract: In CAS adaptation, a challenge is to define the dynamic architecture of the system and changes in its behavior. Implementation-wise, this is projected into an adaptation mechanism, typically realized as an Adaptation Manager (AM). With the advances of...
Hierarchical Reward Design from Language: Enhancing Alignment of Agent Behavior with Human Specifications
arXiv:2602.18582v1 Announce Type: new
Abstract: When training artificial intelligence (AI) to perform tasks, humans often care not only about whether a task is completed but also how it is performed. As AI agents tackle increasingly complex tasks, aligning their behavior with human-provided specifi...
Learning to Remember: End-to-End Training of Memory Agents for Long-Context Reasoning
arXiv:2602.18493v1 Announce Type: new
Abstract: Long-context LLMs and Retrieval-Augmented Generation (RAG) systems process information passively, deferring state tracking, contradiction resolution, and evidence aggregation to query time, which becomes brittle under ultra long streams with frequent ...
Physiologically Informed Deep Learning: A Multi-Scale Framework for Next-Generation PBPK Modeling
arXiv:2602.18472v1 Announce Type: new
Abstract: Physiologically Based Pharmacokinetic (PBPK) modeling is a cornerstone of model-informed drug development (MIDD), providing a mechanistic framework to predict drug absorption, distribution, metabolism, and excretion (ADME). Despite its utility, adopti...
AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Recent multimodal large language models (MLLMs) such as GPT-4o and Qwen3-Omni show strong perception but struggle in multi-speaker, dialogue-centric settings that demand agentic reasoning tracking who speaks, maintaining roles, and grounding events across time. These scenarios are central to multimo...
Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining
One of the first pre-processing steps for constructing web-scale LLM pretraining datasets involves extracting text from HTML. Despite the immense diversity of web content, existing open-source datasets predominantly apply a single fixed extractor to all webpages. In this work, we investigate whether...
The Potential of CoT for Reasoning: A Closer Look at Trace Dynamics
Chain-of-thought (CoT) prompting is a de-facto standard technique to elicit reasoning-like responses from large language models (LLMs), allowing them to spell out individual steps before giving a final answer. While the resemblance to human-like reasoning is undeniable, the driving forces underpinni...
Alignment in Time: Peak-Aware Orchestration for Long-Horizon Agentic Systems
arXiv:2602.17910v1 Announce Type: new
Abstract: Traditional AI alignment primarily focuses on individual model outputs; however, autonomous agents in long-horizon workflows require sustained reliability across entire interaction trajectories. We introduce APEMO (Affect-aware Peak-End Modulation for...