Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation
arXiv:2602.12544v1 Announce Type: new
Abstract: We present a scalable pipeline for automatically generating high-quality training data for web agents. In particular, a major challenge in identifying high-quality training instances is trajectory evaluation - quantifying how much progress was made to...
Intent-Driven Smart Manufacturing Integrating Knowledge Graphs and Large Language Models
arXiv:2602.12419v1 Announce Type: new
Abstract: The increasing complexity of smart manufacturing environments demands interfaces that can translate high-level human intents into machine-executable actions. This paper presents a unified framework that integrates instruction-tuned Large Language Mode...
A Theoretical Framework for Adaptive Utility-Weighted Benchmarking
arXiv:2602.12356v1 Announce Type: new
Abstract: Benchmarking has long served as a foundational practice in machine learning and, increasingly, in modern AI systems such as large language models, where shared tasks, metrics, and leaderboards offer a common basis for measuring progress and comparing ...
GT-HarmBench: Benchmarking AI Safety Risks Through the Lens of Game Theory
arXiv:2602.12316v1 Announce Type: new
Abstract: Frontier AI systems are increasingly capable and deployed in high-stakes multi-agent environments. However, existing AI safety benchmarks largely evaluate single agents, leaving multi-agent risks such as coordination failure and conflict poorly unders...
Evolving Beyond Snapshots: Harmonizing Structure and Sequence via Entity State Tuning for Temporal Knowledge Graph Forecasting
arXiv:2602.12389v1 Announce Type: new
Abstract: Temporal knowledge graph (TKG) forecasting requires predicting future facts by jointly modeling structural dependencies within each snapshot and temporal evolution across snapshots. However, most existing methods are stateless: they recompute entity r...
GAC-KAN: An Ultra-Lightweight GNSS Interference Classifier for GenAI-Powered Consumer Edge Devices
arXiv:2602.11186v1 Announce Type: new
Abstract: The integration of Generative AI (GenAI) into Consumer Electronics (CE)--from AI-powered assistants in wearables to generative planning in autonomous Uncrewed Aerial Vehicles (UAVs)--has revolutionized user experiences. However, these GenAI applicatio...
Spectra: Rethinking Optimizers for LLMs Under Spectral Anisotropy
arXiv:2602.11185v1 Announce Type: new
Abstract: Gradient signals in LLM training are highly anisotropic: recurrent linguistic structure concentrates energy into a small set of dominant spectral directions, while context specific information resides in a long tail. We show that this spike tail separ...
Automated Optimization Modeling via a Localizable Error-Driven Perspective
arXiv:2602.11164v1 Announce Type: new
Abstract: Automated optimization modeling via Large Language Models (LLMs) has emerged as a promising approach to assist complex human decision-making. While post-training has become a pivotal technique to enhance LLMs' capabilities in this domain, its effectiv...
TDPNavigator-Placer: Thermal- and Wirelength-Aware Chiplet Placement in 2.5D Systems Through Multi-Agent Reinforcement Learning
arXiv:2602.11187v1 Announce Type: new
Abstract: The rapid growth of electronics has accelerated the adoption of 2.5D integrated circuits, where effective automated chiplet placement is essential as systems scale to larger and more heterogeneous chiplet assemblies. Existing placement methods typical...
arXiv:2602.11298v1 Announce Type: new
Abstract: We introduce Voxtral Realtime, a natively streaming automatic speech recognition model that matches offline transcription quality at sub-second latency. Unlike approaches that adapt offline models through chunking or sliding windows, Voxtral Realtime ...
Latent Generative Solvers for Generalizable Long-Term Physics Simulation
arXiv:2602.11229v1 Announce Type: new
Abstract: We study long-horizon surrogate simulation across heterogeneous PDE systems. We introduce Latent Generative Solvers (LGS), a two-stage framework that (i) maps diverse PDE states into a shared latent physics space with a pretrained VAE, and (ii) learns...
The PBSAI Governance Ecosystem: A Multi-Agent AI Reference Architecture for Securing Enterprise AI Estates
arXiv:2602.11301v1 Announce Type: new
Abstract: Enterprises are rapidly deploying large language models, retrieval augmented generation pipelines, and tool using agents into production, often on shared high performance computing clusters and cloud accelerator platforms that also support defensive a...
arXiv:2602.10177v1 Announce Type: new
Abstract: Recent advances in foundational models have yielded reasoning systems capable of achieving a gold-medal standard at the International Mathematical Olympiad. The transition from competition-level problem-solving to professional research, however, requi...
Signature-Kernel Based Evaluation Metrics for Robust Probabilistic and Tail-Event Forecasting
arXiv:2602.10182v1 Announce Type: new
Abstract: Probabilistic forecasting is increasingly critical across high-stakes domains, from finance and epidemiology to climate science. However, current evaluation frameworks lack a consensus metric and suffer from two critical flaws: they often assume indep...
Large Language Models Predict Functional Outcomes after Acute Ischemic Stroke
arXiv:2602.10119v1 Announce Type: new
Abstract: Accurate prediction of functional outcomes after acute ischemic stroke can inform clinical decision-making and resource allocation. Prior work on modified Rankin Scale (mRS) prediction has relied primarily on structured variables (e.g., age, NIHSS) an...
Found-RL: foundation model-enhanced reinforcement learning for autonomous driving
arXiv:2602.10458v1 Announce Type: new
Abstract: Reinforcement Learning (RL) has emerged as a dominant paradigm for end-to-end autonomous driving (AD). However, RL suffers from sample inefficiency and a lack of semantic interpretability in complex scenarios. Foundation Models, particularly Vision-La...
Abstraction Generation for Generalized Planning with Pretrained Large Language Models
arXiv:2602.10485v1 Announce Type: new
Abstract: Qualitative Numerical Planning (QNP) serves as an important abstraction model for generalized planning (GP), which aims to compute general plans that solve multiple instances at once. Recent works show that large language models (LLMs) can function as...
Discovering Differences in Strategic Behavior Between Humans and LLMs
arXiv:2602.10324v1 Announce Type: new
Abstract: As Large Language Models (LLMs) are increasingly deployed in social and strategic scenarios, it becomes critical to understand where and why their behavior diverges from that of humans. While behavioral game theory (BGT) provides a framework for analy...
LiveMedBench: A Contamination-Free Medical Benchmark for LLMs with Automated Rubric Evaluation
arXiv:2602.10367v1 Announce Type: new
Abstract: The deployment of Large Language Models (LLMs) in high-stakes clinical settings demands rigorous and reliable evaluation. However, existing medical benchmarks remain static, suffering from two critical limitations: (1) data contamination, where test s...
MERIT Feedback Elicits Better Bargaining in LLM Negotiators
arXiv:2602.10467v1 Announce Type: new
Abstract: Bargaining is often regarded as a logical arena rather than an art or a matter of intuition, yet Large Language Models (LLMs) still struggle to navigate it due to limited strategic depth and difficulty adapting to complex human factors. Current benchm...
Enhanced Graph Transformer with Serialized Graph Tokens
arXiv:2602.09065v1 Announce Type: new
Abstract: Transformers have demonstrated success in graph learning, particularly for node-level tasks. However, existing methods encounter an information bottleneck when generating graph-level representations. The prevalent single token paradigm fails to fully ...
Learning to Remember, Learn, and Forget in Attention-Based Models
arXiv:2602.09075v1 Announce Type: new
Abstract: In-Context Learning (ICL) in transformers acts as an online associative memory and is believed to underpin their high performance on complex sequence processing tasks. However, in gated linear attention models, this memory has a fixed capacity and is ...
Spectral Disentanglement and Enhancement: A Dual-domain Contrastive Framework for Representation Learning
arXiv:2602.09066v1 Announce Type: new
Abstract: Large-scale multimodal contrastive learning has recently achieved impressive success in learning rich and transferable representations, yet it remains fundamentally limited by the uniform treatment of feature dimensions and the neglect of the intrinsi...
Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models
arXiv:2602.09080v1 Announce Type: new
Abstract: Large Multimodal Models (LMMs) have achieved remarkable success in vision-language tasks, yet their vast parameter counts are often underutilized during both training and inference. In this work, we embrace the idea of looping back to move forward: re...