TrajTok: Learning Trajectory Tokens enables better Video Understanding
Tokenization in video models, typically through patchification, generates an excessive and redundant number of tokens. This severely limits video efficiency and scalability. While recent trajectory-based tokenizers offer a promising solution by decoupling video duration from token count, they rely o...
From Garbage to Gold: A Data-Architectural Theory of Predictive Robustness
arXiv:2603.12288v1 Announce Type: new
Abstract: Tabular machine learning presents a paradox: modern models achieve state-of-the-art performance using high-dimensional (high-D), collinear, error-prone data, defying the "Garbage In, Garbage Out" mantra. To help resolve this, we synthesize principles ...
Multi-objective Genetic Programming with Multi-view Multi-level Feature for Enhanced Protein Secondary Structure Prediction
arXiv:2603.12293v1 Announce Type: new
Abstract: Predicting protein secondary structure is essential for understanding protein function and advancing drug discovery. However, the intricate sequence-structure relationship poses significant challenges for accurate modeling. To address these, we propos...
Global Evolutionary Steering: Refining Activation Steering Control via Cross-Layer Consistency
arXiv:2603.12298v1 Announce Type: new
Abstract: Activation engineering enables precise control over Large Language Models (LLMs) without the computational cost of fine-tuning. However, existing methods deriving vectors from static activation differences are susceptible to high-dimensional noise and...
arXiv:2603.12372v1 Announce Type: new
Abstract: Large Reasoning Models (LRMs) have shown remarkable reasoning capabilities, yet they often suffer from overthinking, expending redundant computational steps on simple problems, or underthinking, failing to explore sufficient reasoning paths despite in...
Generating Expressive and Customizable Evals for Timeseries Data Analysis Agents with AgentFuel
arXiv:2603.12483v1 Announce Type: new
Abstract: Across many domains (e.g., IoT, observability, telecommunications, cybersecurity), there is an emerging adoption of conversational data analysis agents that enable users to "talk to your data" to extract insights. Such data analysis agents operate on ...
arXiv:2603.12710v1 Announce Type: new
Abstract: Developing autonomous agents for web-based tasks is a core challenge in AI. While Large Language Model (LLM) agents can interpret complex user requests, they often operate as black boxes, making it difficult to diagnose why they fail or how they plan....
Scientists discover AI can make humans more creative
Artificial intelligence is often portrayed as a tool that replaces human work, but new research from Swansea University suggests a far more exciting role: creative collaborator. In a large study with more than 800 participants designing virtual cars, researchers found that AI-generated design galler...
Unlocking the power of data: How we built text-to-SQL with agentic RAG at Rocket Mortgage
Your company’s data holds answers, but accessing them is often the hard part. Here’s how Rocket Mortgage built a text-to-SQL system with agentic RAG to make data accessible to everyone.
-->
Understanding the behavior of complex machine learning systems, particularly Large Language Models (LLMs), is a critical challenge in modern artificial intelligence. Interpretability research aims to make the decision-making process more transparent to model builders and impacted humans, a ...
Scientists built the hardest AI test ever and the results are surprising
As AI systems began acing traditional tests, researchers realized those benchmarks were no longer tough enough. In response, nearly 1,000 experts created Humanity’s Last Exam, a massive 2,500-question challenge covering highly specialized topics across many fields. The exam was engineered so that an...
Interventional Time Series Priors for Causal Foundation Models
arXiv:2603.11090v1 Announce Type: new
Abstract: Prior-data fitted networks (PFNs) have emerged as powerful foundation models for tabular causal inference, yet their extension to time series remains limited by the absence of synthetic data generators that provide interventional targets. Existing tim...
Graph Tokenization for Bridging Graphs and Transformers
arXiv:2603.11099v1 Announce Type: new
Abstract: The success of large pretrained Transformers is closely tied to tokenizers, which convert raw input into discrete symbols. Extending these models to graph-structured data remains a significant challenge. In this work, we introduce a graph tokenization...
DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use
arXiv:2603.11076v1 Announce Type: new
Abstract: Recent work synthesizes agentic tasks for post-training tool-using LLMs, yet robust generalization under shifts in tasks and toolsets remains an open challenge. We trace this brittleness to insufficient diversity in synthesized tasks. Scaling diversit...
A Survey of Reasoning in Autonomous Driving Systems: Open Challenges and Emerging Paradigms
arXiv:2603.11093v1 Announce Type: new
Abstract: The development of high-level autonomous driving (AD) is shifting from perception-centric limitations to a more fundamental bottleneck, namely, a deficit in robust and generalizable reasoning. Although current AD systems manage structured environments...
PACED: Distillation at the Frontier of Student Competence
arXiv:2603.11178v1 Announce Type: new
Abstract: Standard LLM distillation wastes compute on two fronts: problems the student has already mastered (near-zero gradients) and problems far beyond its reach (incoherent gradients that erode existing capabilities). We show that this waste is not merely in...
Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios
arXiv:2603.11214v1 Announce Type: new
Abstract: We evaluate the autonomous cyber-attack capabilities of frontier AI models on two purpose-built cyber ranges-a 32-step corporate network attack and a 7-step industrial control system attack-that require chaining heterogeneous capabilities across exten...
Reversible Lifelong Model Editing via Semantic Routing-Based LoRA
arXiv:2603.11239v1 Announce Type: new
Abstract: The dynamic evolution of real-world necessitates model editing within Large Language Models. While existing methods explore modular isolation or parameter-efficient strategies, they still suffer from semantic drift or knowledge forgetting due to conti...
mAceReason-Math: A Dataset of High-Quality Multilingual Math Problems Ready For RLVR
Reinforcement Learning with Verifiable Rewards (RLVR) has been successfully applied to significantly boost the capabilities of pretrained large language models, especially in the math and logic problem domains. However, current research and available training datasets remain English-centric. While m...
Multilingual Reasoning Gym: Multilingual Scaling of Procedural Reasoning Environments
We present the Multilingual Reasoning Gym, an extension of Reasoning Gym (Stojanovski et al., 2025), that procedurally generates verifiable reasoning problems across 14 languages. We translate templates for 94 tasks with native-speaker validation in 10 languages and targeted code or template adaptat...
Meta buys Moltbook: The social network where AI agents talk to each other
Meta’s acquisition of Moltbook highlights a growing focus on agent-to-agent systems and the infrastructure required to support them. It’s a small deal that signals bigger shifts in how AI ecosystems may evolve.
Systematic debugging for AI agents: Introducing the AgentRx framework
As AI agents transition from simple chatbots to autonomous systems capable of managing cloud incidents, navigating complex web interfaces, and executing multi-step API workflows, a new challenge has emerged: transparency. When a human makes a mistake, we can usually trace the logic. But when an AI a...
arXiv:2603.09980v1 Announce Type: new
Abstract: LLM unlearning is essential for mitigating safety, copyright, and privacy concerns in pre-trained large language models (LLMs). Compared to preference alignment, it offers a more explicit way by removing undesirable knowledge characterized by specific...
MoE-SpAc: Efficient MoE Inference Based on Speculative Activation Utility in Heterogeneous Edge Scenarios
arXiv:2603.09983v1 Announce Type: new
Abstract: Mixture-of-Experts (MoE) models enable scalable performance but face severe memory constraints on edge devices. Existing offloading strategies struggle with I/O bottlenecks due to the dynamic, low-information nature of autoregressive expert activation...