Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments
Large-scale commercial search systems optimize for relevance to drive successful sessions that help users find what they are looking for. To maximize relevance, we leverage two complementary objectives: behavioral relevance (results users tend to click or download) and textual relevance (a result’s ...
The Way We Notice, That's What Really Matters: Instantiating UI Components with Distinguishing Variations
Front-end developers author UI components to be broadly reusable by parameterizing visual and behavioral properties. While flexible, this makes instantiation harder, as developers must reason about numerous property values and interactions. In practice, they must explore the component’s large design...
By mid-morning, a typical knowledge worker is already juggling a client report, a budget spreadsheet, a slide deck, and an email backlog, all interdependent and all demanding attention at once. For AI agents to be genuinely useful in that environment, they will need to operate the same way, but toda...
Latent Context Compilation: Distilling Long Context into Compact Portable Memory
arXiv:2602.21221v1 Announce Type: new
Abstract: Efficient long-context LLM deployment is stalled by a dichotomy between amortized compression, which struggles with out-of-distribution generalization, and Test-Time Training, which incurs prohibitive synthetic data costs and requires modifying model ...
ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision Traces
arXiv:2602.21231v1 Announce Type: new
Abstract: We present ACAR (Adaptive Complexity and Attribution Routing), a measurement framework for studying multi-model orchestration under auditable conditions. ACAR uses self-consistency variance (sigma) computed from N=3 probe samples to route tasks across...
AngelSlim: A more accessible, comprehensive, and efficient toolkit for large model compression
arXiv:2602.21233v1 Announce Type: new
Abstract: This technical report introduces AngelSlim, a comprehensive and versatile toolkit for large model compression developed by the Tencent Hunyuan team. By consolidating cutting-edge algorithms, including quantization, speculative decoding, token pruning,...
Group Orthogonalized Policy Optimization:Group Policy Optimization as Orthogonal Projection in Hilbert Space
arXiv:2602.21269v1 Announce Type: new
Abstract: We present Group Orthogonalized Policy Optimization (GOPO), a new alignment algorithm for large language models derived from the geometry of Hilbert function spaces. Instead of optimizing on the probability simplex and inheriting the exponential curva...
A Dynamic Survey of Soft Set Theory and Its Extensions
arXiv:2602.21268v1 Announce Type: new
Abstract: Soft set theory provides a direct framework for parameterized decision modeling by assigning to each attribute (parameter) a subset of a given universe, thereby representing uncertainty in a structured way [1, 2]. Over the past decades, the theory has...
A Hierarchical Multi-Agent System for Autonomous Discovery in Geoscientific Data Archives
arXiv:2602.21351v1 Announce Type: new
Abstract: The rapid accumulation of Earth science data has created a significant scalability challenge; while repositories like PANGAEA host vast collections of datasets, citation metrics indicate that a substantial portion remains underutilized, limiting data ...
Beyond Refusal: Probing the Limits of Agentic Self-Correction for Semantic Sensitive Information
arXiv:2602.21496v1 Announce Type: new
Abstract: While defenses for structured PII are mature, Large Language Models (LLMs) pose a new threat: Semantic Sensitive Information (SemSI), where models infer sensitive identity attributes, generate reputation-harmful content, or hallucinate potentially wro...
ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning
arXiv:2602.21534v1 Announce Type: new
Abstract: Agentic reinforcement learning (ARL) has rapidly gained attention as a promising paradigm for training agents to solve complex, multi-step interactive tasks. Despite encouraging early results, ARL remains highly unstable, often leading to training col...
Power and Limitations of Aggregation in Compound AI Systems
arXiv:2602.21556v1 Announce Type: new
Abstract: When designing compound AI systems, a common approach is to query multiple copies of the same model and aggregate the responses to produce a synthesized output. Given the homogeneity of these models, this raises the question of whether aggregation unl...
Tensor Network Generator-Enhanced Optimization for Traveling Salesman Problem
arXiv:2602.20175v1 Announce Type: new
Abstract: We present an application of the tensor network generator-enhanced optimization (TN-GEO) framework to address the traveling salesman problem (TSP), a fundamental combinatorial optimization challenge. Our approach employs a tensor network Born machine ...
MoBiQuant: Mixture-of-Bits Quantization for Token-Adaptive Elastic LLMs
arXiv:2602.20191v1 Announce Type: new
Abstract: Changing runtime complexity on cloud and edge devices necessitates elastic large language model (LLM) deployment, where an LLM can be inferred with various quantization precisions based on available computational resources. However, it has been observ...
FedAvg-Based CTMC Hazard Model for Federated Bridge Deterioration Assessment
arXiv:2602.20194v1 Announce Type: new
Abstract: Bridge periodic inspection records contain sensitive information about public infrastructure, making cross-organizational data sharing impractical under existing data governance constraints. We propose a federated framework for estimating a Continuous...
Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning
arXiv:2602.20197v1 Announce Type: new
Abstract: Reinforcement Learning with verifiable rewards (RLVR) has emerged as a primary learning paradigm for enhancing the reasoning capabilities of multi-modal large language models (MLLMs). However, during RL training, the enormous state space of MLLM and s...
IMOVNO+: A Regional Partitioning and Meta-Heuristic Ensemble Framework for Imbalanced Multi-Class Learning
arXiv:2602.20199v1 Announce Type: new
Abstract: Class imbalance, overlap, and noise degrade data quality, reduce model reliability, and limit generalization. Although widely studied in binary classification, these issues remain underexplored in multi-class settings, where complex inter-class relati...
An artificial intelligence framework for end-to-end rare disease phenotyping from clinical notes using large language models
arXiv:2602.20324v1 Announce Type: new
Abstract: Phenotyping is fundamental to rare disease diagnosis, but manual curation of structured phenotypes from clinical notes is labor-intensive and difficult to scale. Existing artificial intelligence approaches typically optimize individual components of p...
DMCD: Semantic-Statistical Framework for Causal Discovery
arXiv:2602.20333v1 Announce Type: new
Abstract: We present DMCD (DataMap Causal Discovery), a two-phase causal discovery framework that integrates LLM-based semantic drafting from variable metadata with statistical validation on observational data. In Phase I, a large language model proposes a spar...
Diffusion Modulation via Environment Mechanism Modeling for Planning
arXiv:2602.20422v1 Announce Type: new
Abstract: Diffusion models have shown promising capabilities in trajectory generation for planning in offline reinforcement learning (RL). However, conventional diffusion-based planning methods often fail to account for the fact that generating trajectories in ...
Implicit Intelligence -- Evaluating Agents on What Users Don't Say
arXiv:2602.20424v1 Announce Type: new
Abstract: Real-world requests to AI agents are fundamentally underspecified. Natural human communication relies on shared context and unstated constraints that speakers expect listeners to infer. Current agentic benchmarks test explicit instruction-following bu...
Closing the Gap Between Text and Speech Understanding in LLMs
Large Language Models (LLMs) can be adapted to extend their text capabilities to speech inputs. However, these speech-adapted LLMs consistently underperform their text-based counterparts—and even cascaded pipelines—on language understanding tasks. We term this shortfall the text-speech understanding...
A.R.I.S.: Automated Recycling Identification System for E-Waste Classification Using Deep Learning
Traditional electronic recycling processes suffer from significant resource loss due to inadequate material separation and identification capabilities, limiting material recovery. We present A.R.I.S. (Automated Recycling Identification System), a low-cost, portable sorter for shredded e-waste that a...
Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates
Prior studies investigating the internal workings of LLMs have uncovered sparse subnetworks, often referred to as circuits, that are responsible for performing specific tasks. Additionally, it has been shown that model performance improvement through fine-tuning often results from the strengthening ...