Multivariate Conformal Prediction using Optimal Transport
Conformal prediction (CP) quantifies the uncertainty of machine learning models by constructing sets of plausible outputs. These sets are constructed by leveraging a so-called conformity score, a quantity computed using the input point of interest, a prediction model, and past observations. CP sets ...
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search
Multimodal Large Language Models (MLLMs) in real-world applications require access to external knowledge sources and must remain responsive to the dynamic and ever-changing real-world information in order to address information-seeking and knowledge-intensive user queries. Existing approaches, such ...
Over-Searching in Search-Augmented Large Language Models
Search-augmented large language models (LLMs) excel at knowledge-intensive tasks by integrating external retrieval.
However, they often over-search – unnecessarily invoking search tool even when it does not improve response quality,
which leads to computational inefficiency and hallucinations by inc...
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Unified multimodal Large Language Models (LLMs) that can both understand and generate visual content hold immense potential. However, existing open-source models often suffer from a performance trade-off between these capabilities. We present Manzano, a simple and scalable unified framework that sub...
The Forgotten Shield: Safety Grafting in Parameter-Space for Medical MLLMs
arXiv:2601.04199v1 Announce Type: new
Abstract: Medical Multimodal Large Language Models (Medical MLLMs) have achieved remarkable progress in specialized medical tasks; however, research into their safety has lagged, posing potential risks for real-world deployment. In this paper, we first establis...
Green MLOps: Closed-Loop, Energy-Aware Inference with NVIDIA Triton, FastAPI, and Bio-Inspired Thresholding
arXiv:2601.04250v1 Announce Type: new
Abstract: Energy efficiency is a first-order concern in AI deployment, as long-running inference can exceed training in cumulative carbon impact. We propose a bio-inspired framework that maps protein-folding energy basins to inference cost landscapes and contro...
Safety-Utility Conflicts Are Not Global: Surgical Alignment via Head-Level Diagnosis
arXiv:2601.04262v1 Announce Type: new
Abstract: Safety alignment in Large Language Models (LLMs) inherently presents a multi-objective optimization conflict, often accompanied by an unintended degradation of general capabilities. Existing mitigation strategies typically rely on global gradient geom...
Learning to Reason: Temporal Saliency Distillation for Interpretable Knowledge Transfer
arXiv:2601.04263v1 Announce Type: new
Abstract: Knowledge distillation has proven effective for model compression by transferring knowledge from a larger network called the teacher to a smaller network called the student. Current knowledge distillation in time series is predominantly based on logit...
MemKD: Memory-Discrepancy Knowledge Distillation for Efficient Time Series Classification
arXiv:2601.04264v1 Announce Type: new
Abstract: Deep learning models, particularly recurrent neural networks and their variants, such as long short-term memory, have significantly advanced time series data analysis. These models capture complex, sequential patterns in time series, enabling real-tim...
Formal Analysis of AGI Decision-Theoretic Models and the Confrontation Question
arXiv:2601.04234v1 Announce Type: new
Abstract: Artificial General Intelligence (AGI) may face a confrontation question: under what conditions would a rationally self-interested AGI choose to seize power or eliminate human control (a confrontation) rather than remain cooperative? We formalize this ...
Actively Obtaining Environmental Feedback for Autonomous Action Evaluation Without Predefined Measurements
arXiv:2601.04235v1 Announce Type: new
Abstract: Obtaining reliable feedback from the environment is a fundamental capability for intelligent agents to evaluate the correctness of their actions and to accumulate reusable knowledge. However, most existing approaches rely on predefined measurements or...
SAGE-32B: Agentic Reasoning via Iterative Distillation
arXiv:2601.04237v1 Announce Type: new
Abstract: We demonstrate SAGE-32B, a 32 billion parameter language model that focuses on agentic reasoning and long range planning tasks. Unlike chat models that aim for general conversation fluency, SAGE-32B is designed to operate in an agentic loop, emphasizi...
arXiv:2601.04239v1 Announce Type: new
Abstract: The Cyclic Antibandwidth Problem (CABP), a variant of the Antibandwidth Problem, is an NP-hard graph labeling problem with numerous applications. Despite significant research efforts, existing state-of-the-art approaches for CABP are exclusively heuri...
Pretraining with Hierarchical Memories: Separating Long-Tail and Common Knowledge
The impressive performance gains of modern language models currently rely on scaling parameters: larger models store more world knowledge and reason better. Yet compressing all world knowledge into parameters is unnecessary, as only a fraction is used per prompt, and impractical for edge devices wit...
Which Evaluation for Which Model? A Taxonomy for Speech Model Assessment
Speech foundation models have recently achieved remarkable capabilities across a wide range of tasks. However, their evaluation remains disjointed across tasks and model types. Different models excel at distinct aspects of speech processing and thus require different evaluation protocols. This paper...
AgentBuilder: Exploring Scaffolds for Prototyping User Experiences of Interface Agents
Interface agents powered by generative AI models (referred to as “agents”) can automate actions based on user commands. An important aspect of developing agents is their user experience (i.e., agent experience). There is a growing need to provide scaffolds for a broader set of individuals beyond AI ...
Lightweight Transformer Architectures for Edge Devices in Real-Time Applications
arXiv:2601.03290v1 Announce Type: new
Abstract: The deployment of transformer-based models on resource-constrained edge devices represents a critical challenge in enabling real-time artificial intelligence applications. This comprehensive survey examines lightweight transformer architectures specif...
Ratio-Variance Regularized Policy Optimization for Efficient LLM Fine-tuning
arXiv:2601.03320v1 Announce Type: new
Abstract: On-policy reinforcement learning (RL), particularly Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO), has become the dominant paradigm for fine-tuning large language models (LLMs). While policy ratio clipping stabilizes...
Mastering the Game of Go with Self-play Experience Replay
arXiv:2601.03306v1 Announce Type: new
Abstract: The game of Go has long served as a benchmark for artificial intelligence, demanding sophisticated strategic reasoning and long-term planning. Previous approaches such as AlphaGo and its successors, have predominantly relied on model-based Monte-Carlo...
Digital Red Queen: Adversarial Program Evolution in Core War with LLMs
arXiv:2601.03335v1 Announce Type: new
Abstract: Large language models (LLMs) are increasingly being used to evolve solutions to problems in many domains, in a process inspired by biological evolution. However, unlike biological evolution, most LLM-evolution frameworks are formulated as static optim...
Enhancing LLM Instruction Following: An Evaluation-Driven Multi-Agentic Workflow for Prompt Instructions Optimization
arXiv:2601.03359v1 Announce Type: new
Abstract: Large Language Models (LLMs) often generate substantively relevant content but fail to adhere to formal constraints, leading to outputs that are conceptually correct but procedurally flawed. Traditional prompt refinement approaches focus on rephrasing...
Exploration Through Introspection: A Self-Aware Reward Model
arXiv:2601.03389v1 Announce Type: new
Abstract: Understanding how artificial agents model internal mental states is central to advancing Theory of Mind in AI. Evidence points to a unified system for self- and other-awareness. We explore this self-awareness by having reinforcement learning agents in...
Toward Maturity-Based Certification of Embodied AI: Quantifying Trustworthiness Through Measurement Mechanisms
arXiv:2601.03470v2 Announce Type: new
Abstract: We propose a maturity-based framework for certifying embodied AI systems through explicit measurement mechanisms. We argue that certifiable embodied AI requires structured assessment frameworks, quantitative scoring mechanisms, and methods for navigat...
Less than a trillionth of a second: Ultrafast UV light could transform communications and imaging
Researchers have built a new platform that produces ultrashort UV-C laser pulses and detects them at room temperature using atom-thin materials. The light flashes last just femtoseconds and can be used to send encoded messages through open space. The system relies on efficient laser generation and h...