Directional Concentration Uncertainty: A representational approach to uncertainty quantification for generative models
arXiv:2602.13264v1 Announce Type: new
Abstract: In the critical task of making generative models trustworthy and robust, methods for Uncertainty Quantification (UQ) have begun to show encouraging potential. However, many of these methods rely on rigid heuristics that fail to generalize across tasks...
Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
Developing autonomous agents that effectively interact with Graphic User Interfaces (GUIs) remains a challenging open problem, especially for small on-device models. In this paper, we present Ferret-UI Lite, a compact, end-to-end GUI agent that operates across diverse platforms, including mobile, we...
How can we trust the correctness of a learned model on a particular input of interest? Model accuracy is typically measured on average over a distribution of inputs, giving no guarantee for any fixed input. This paper proposes a theoretically-founded solution to this problem: to train Self-Proving m...
Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation
arXiv:2602.12544v1 Announce Type: new
Abstract: We present a scalable pipeline for automatically generating high-quality training data for web agents. In particular, a major challenge in identifying high-quality training instances is trajectory evaluation - quantifying how much progress was made to...
Intent-Driven Smart Manufacturing Integrating Knowledge Graphs and Large Language Models
arXiv:2602.12419v1 Announce Type: new
Abstract: The increasing complexity of smart manufacturing environments demands interfaces that can translate high-level human intents into machine-executable actions. This paper presents a unified framework that integrates instruction-tuned Large Language Mode...
Evolving Beyond Snapshots: Harmonizing Structure and Sequence via Entity State Tuning for Temporal Knowledge Graph Forecasting
arXiv:2602.12389v1 Announce Type: new
Abstract: Temporal knowledge graph (TKG) forecasting requires predicting future facts by jointly modeling structural dependencies within each snapshot and temporal evolution across snapshots. However, most existing methods are stateless: they recompute entity r...
A Theoretical Framework for Adaptive Utility-Weighted Benchmarking
arXiv:2602.12356v1 Announce Type: new
Abstract: Benchmarking has long served as a foundational practice in machine learning and, increasingly, in modern AI systems such as large language models, where shared tasks, metrics, and leaderboards offer a common basis for measuring progress and comparing ...
GT-HarmBench: Benchmarking AI Safety Risks Through the Lens of Game Theory
arXiv:2602.12316v1 Announce Type: new
Abstract: Frontier AI systems are increasingly capable and deployed in high-stakes multi-agent environments. However, existing AI safety benchmarks largely evaluate single agents, leaving multi-agent risks such as coordination failure and conflict poorly unders...
Intrinsic Credit Assignment for Long Horizon Interaction
arXiv:2602.12342v1 Announce Type: new
Abstract: How can we train agents to navigate uncertainty over long horizons? In this work, we propose {\Delta}Belief-RL, which leverages a language model's own intrinsic beliefs to reward intermediate progress. Our method utilizes the change in the probability...
The Appeal and Reality of Recycling LoRAs with Adaptive Merging
arXiv:2602.12323v1 Announce Type: new
Abstract: The widespread availability of fine-tuned LoRA modules for open pre-trained models has led to an interest in methods that can adaptively merge LoRAs to improve performance. These methods typically include some way of selecting LoRAs from a pool and tu...
Abstractive Red-Teaming of Language Model Character
arXiv:2602.12318v1 Announce Type: new
Abstract: We want language model assistants to conform to a character specification, which asserts how the model should act across diverse user interactions. While models typically follow these character specifications, they can occasionally violate them in lar...
OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization
arXiv:2602.12305v1 Announce Type: new
Abstract: Generating high-performance CUDA kernels remains challenging due to the need to navigate a combinatorial space of low-level transformations under noisy and expensive hardware feedback. Although large language models can synthesize functionally correct...
Asynchronous Verified Semantic Caching for Tiered LLM Architectures
Large language models (LLMs) now sit in the critical path of search, assistance, and agentic workflows, making semantic caching essential for reducing inference cost and latency. Production deployments typically use a tiered static-dynamic design: a static cache of curated, offline vetted responses ...
Brain inspired machines are better at math than expected
Neuromorphic computers modeled after the human brain can now solve the complex equations behind physics simulations — something once thought possible only with energy-hungry supercomputers. The breakthrough could lead to powerful, low-energy supercomputers while revealing new secrets about how our b...
Turning messy retail data into actionable insights. See how we use large language models to uncover what drives customer decisions and optimize every interaction.
The PBSAI Governance Ecosystem: A Multi-Agent AI Reference Architecture for Securing Enterprise AI Estates
arXiv:2602.11301v1 Announce Type: new
Abstract: Enterprises are rapidly deploying large language models, retrieval augmented generation pipelines, and tool using agents into production, often on shared high performance computing clusters and cloud accelerator platforms that also support defensive a...
arXiv:2602.11298v1 Announce Type: new
Abstract: We introduce Voxtral Realtime, a natively streaming automatic speech recognition model that matches offline transcription quality at sub-second latency. Unlike approaches that adapt offline models through chunking or sliding windows, Voxtral Realtime ...
Latent Generative Solvers for Generalizable Long-Term Physics Simulation
arXiv:2602.11229v1 Announce Type: new
Abstract: We study long-horizon surrogate simulation across heterogeneous PDE systems. We introduce Latent Generative Solvers (LGS), a two-stage framework that (i) maps diverse PDE states into a shared latent physics space with a pretrained VAE, and (ii) learns...
TDPNavigator-Placer: Thermal- and Wirelength-Aware Chiplet Placement in 2.5D Systems Through Multi-Agent Reinforcement Learning
arXiv:2602.11187v1 Announce Type: new
Abstract: The rapid growth of electronics has accelerated the adoption of 2.5D integrated circuits, where effective automated chiplet placement is essential as systems scale to larger and more heterogeneous chiplet assemblies. Existing placement methods typical...
GAC-KAN: An Ultra-Lightweight GNSS Interference Classifier for GenAI-Powered Consumer Edge Devices
arXiv:2602.11186v1 Announce Type: new
Abstract: The integration of Generative AI (GenAI) into Consumer Electronics (CE)--from AI-powered assistants in wearables to generative planning in autonomous Uncrewed Aerial Vehicles (UAVs)--has revolutionized user experiences. However, these GenAI applicatio...
Spectra: Rethinking Optimizers for LLMs Under Spectral Anisotropy
arXiv:2602.11185v1 Announce Type: new
Abstract: Gradient signals in LLM training are highly anisotropic: recurrent linguistic structure concentrates energy into a small set of dominant spectral directions, while context specific information resides in a long tail. We show that this spike tail separ...
Automated Optimization Modeling via a Localizable Error-Driven Perspective
arXiv:2602.11164v1 Announce Type: new
Abstract: Automated optimization modeling via Large Language Models (LLMs) has emerged as a promising approach to assist complex human decision-making. While post-training has become a pivotal technique to enhance LLMs' capabilities in this domain, its effectiv...
Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration
Hyperparameter tuning can dramatically impact training stability and final performance of large-scale models. Recent works on neural network parameterisations, such as μP, have enabled transfer of optimal global hyperparameters across model sizes. These works propose an empirical practice of search ...