Unifying Ranking and Generation in Query Auto-Completion via Retrieval-Augmented Generation and Multi-Objective Alignment
Query Auto-Completion (QAC) is a critical feature of modern search systems that improves search efficiency by suggesting completions as users type. However, existing approaches face fundamental challenges: traditional retrieve-and-rank pipelines have poor long-tail coverage and require extensive fea...
Silicon Valley has always had its headline makers. Startups launch, scale, and sometimes vanish overnight. But behind the scenes, there is a different kind of company quietly powering the entire ecosystem...
Directional Concentration Uncertainty: A representational approach to uncertainty quantification for generative models
arXiv:2602.13264v1 Announce Type: new
Abstract: In the critical task of making generative models trustworthy and robust, methods for Uncertainty Quantification (UQ) have begun to show encouraging potential. However, many of these methods rely on rigid heuristics that fail to generalize across tasks...
BLUEPRINT Rebuilding a Legacy: Multimodal Retrieval for Complex Engineering Drawings and Documents
arXiv:2602.13345v1 Announce Type: new
Abstract: Decades of engineering drawings and technical records remain locked in legacy archives with inconsistent or missing metadata, making retrieval difficult and often manual. We present Blueprint, a layout-aware multimodal retrieval system designed for la...
Exploring the Performance of ML/DL Architectures on the MNIST-1D Dataset
arXiv:2602.13348v1 Announce Type: new
Abstract: Small datasets like MNIST have historically been instrumental in advancing machine learning research by providing a controlled environment for rapid experimentation and model evaluation. However, their simplicity often limits their utility for disting...
The Speed-up Factor: A Quantitative Multi-Iteration Active Learning Performance Metric
arXiv:2602.13359v1 Announce Type: new
Abstract: Machine learning models excel with abundant annotated data, but annotation is often costly and time-intensive. Active learning (AL) aims to improve the performance-to-annotation ratio by using query methods (QMs) to iteratively select the most informa...
Accelerated Discovery of Cryoprotectant Cocktails via Multi-Objective Bayesian Optimization
arXiv:2602.13398v1 Announce Type: new
Abstract: Designing cryoprotectant agent (CPA) cocktails for vitrification is challenging because formulations must be concentrated enough to suppress ice formation yet non-toxic enough to preserve cell viability. This tradeoff creates a large, multi-objective ...
Agentic AI for Commercial Insurance Underwriting with Adversarial Self-Critique
arXiv:2602.13213v1 Announce Type: new
Abstract: Commercial insurance underwriting is a labor-intensive process that requires manual review of extensive documentation to assess risk and determine policy pricing. While AI offers substantial efficiency improvements, existing solutions lack comprehensi...
BotzoneBench: Scalable LLM Evaluation via Graded AI Anchors
arXiv:2602.13214v1 Announce Type: new
Abstract: Large Language Models (LLMs) are increasingly deployed in interactive environments requiring strategic decision-making, yet systematic evaluation of these capabilities remains challenging. Existing benchmarks for LLMs primarily assess static reasoning...
When to Think Fast and Slow? AMOR: Entropy-Based Metacognitive Gate for Dynamic SSM-Attention Switching
arXiv:2602.13215v1 Announce Type: new
Abstract: Transformers allocate uniform computation to every position, regardless of difficulty. State Space Models (SSMs) offer efficient alternatives but struggle with precise information retrieval over a long horizon. Inspired by dual-process theories of cog...
VeRA: Verified Reasoning Data Augmentation at Scale
arXiv:2602.13217v1 Announce Type: new
Abstract: The main issue with most evaluation schemes today is their "static" nature: the same problems are reused repeatedly, allowing for memorization, format exploitation, and eventual saturation. To measure genuine AI progress, we need evaluation that is ro...
Scaling the Scaling Logic: Agentic Meta-Synthesis of Logic Reasoning
arXiv:2602.13218v1 Announce Type: new
Abstract: Scaling verifiable training signals remains a key bottleneck for Reinforcement Learning from Verifiable Rewards (RLVR). Logical reasoning is a natural substrate: constraints are formal and answers are programmatically checkable. However, prior synthes...
How can we trust the correctness of a learned model on a particular input of interest? Model accuracy is typically measured on average over a distribution of inputs, giving no guarantee for any fixed input. This paper proposes a theoretically-founded solution to this problem: to train Self-Proving m...
Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
Developing autonomous agents that effectively interact with Graphic User Interfaces (GUIs) remains a challenging open problem, especially for small on-device models. In this paper, we present Ferret-UI Lite, a compact, end-to-end GUI agent that operates across diverse platforms, including mobile, we...
OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization
arXiv:2602.12305v1 Announce Type: new
Abstract: Generating high-performance CUDA kernels remains challenging due to the need to navigate a combinatorial space of low-level transformations under noisy and expensive hardware feedback. Although large language models can synthesize functionally correct...
Abstractive Red-Teaming of Language Model Character
arXiv:2602.12318v1 Announce Type: new
Abstract: We want language model assistants to conform to a character specification, which asserts how the model should act across diverse user interactions. While models typically follow these character specifications, they can occasionally violate them in lar...
The Appeal and Reality of Recycling LoRAs with Adaptive Merging
arXiv:2602.12323v1 Announce Type: new
Abstract: The widespread availability of fine-tuned LoRA modules for open pre-trained models has led to an interest in methods that can adaptively merge LoRAs to improve performance. These methods typically include some way of selecting LoRAs from a pool and tu...
Intrinsic Credit Assignment for Long Horizon Interaction
arXiv:2602.12342v1 Announce Type: new
Abstract: How can we train agents to navigate uncertainty over long horizons? In this work, we propose {\Delta}Belief-RL, which leverages a language model's own intrinsic beliefs to reward intermediate progress. Our method utilizes the change in the probability...
GT-HarmBench: Benchmarking AI Safety Risks Through the Lens of Game Theory
arXiv:2602.12316v1 Announce Type: new
Abstract: Frontier AI systems are increasingly capable and deployed in high-stakes multi-agent environments. However, existing AI safety benchmarks largely evaluate single agents, leaving multi-agent risks such as coordination failure and conflict poorly unders...
A Theoretical Framework for Adaptive Utility-Weighted Benchmarking
arXiv:2602.12356v1 Announce Type: new
Abstract: Benchmarking has long served as a foundational practice in machine learning and, increasingly, in modern AI systems such as large language models, where shared tasks, metrics, and leaderboards offer a common basis for measuring progress and comparing ...
Evolving Beyond Snapshots: Harmonizing Structure and Sequence via Entity State Tuning for Temporal Knowledge Graph Forecasting
arXiv:2602.12389v1 Announce Type: new
Abstract: Temporal knowledge graph (TKG) forecasting requires predicting future facts by jointly modeling structural dependencies within each snapshot and temporal evolution across snapshots. However, most existing methods are stateless: they recompute entity r...
Intent-Driven Smart Manufacturing Integrating Knowledge Graphs and Large Language Models
arXiv:2602.12419v1 Announce Type: new
Abstract: The increasing complexity of smart manufacturing environments demands interfaces that can translate high-level human intents into machine-executable actions. This paper presents a unified framework that integrates instruction-tuned Large Language Mode...
Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation
arXiv:2602.12544v1 Announce Type: new
Abstract: We present a scalable pipeline for automatically generating high-quality training data for web agents. In particular, a major challenge in identifying high-quality training instances is trajectory evaluation - quantifying how much progress was made to...