Reusing Pre-Training Data at Test Time is a Compute Multiplier
Large language models learn from their vast pre-training corpora, gaining the ability to solve an ever increasing variety of tasks; yet although researchers work to improve these datasets, there is little effort to understand how efficient the pre-training apparatus is at extracting ideas and knowle...
Physiologically Informed Deep Learning: A Multi-Scale Framework for Next-Generation PBPK Modeling
arXiv:2602.18472v1 Announce Type: new
Abstract: Physiologically Based Pharmacokinetic (PBPK) modeling is a cornerstone of model-informed drug development (MIDD), providing a mechanistic framework to predict drug absorption, distribution, metabolism, and excretion (ADME). Despite its utility, adopti...
Learning to Remember: End-to-End Training of Memory Agents for Long-Context Reasoning
arXiv:2602.18493v1 Announce Type: new
Abstract: Long-context LLMs and Retrieval-Augmented Generation (RAG) systems process information passively, deferring state tracking, contradiction resolution, and evidence aggregation to query time, which becomes brittle under ultra long streams with frequent ...
Hierarchical Reward Design from Language: Enhancing Alignment of Agent Behavior with Human Specifications
arXiv:2602.18582v1 Announce Type: new
Abstract: When training artificial intelligence (AI) to perform tasks, humans often care not only about whether a task is completed but also how it is performed. As AI agents tackle increasingly complex tasks, aligning their behavior with human-provided specifi...
Feedback-based Automated Verification in Vibe Coding of CAS Adaptation Built on Constraint Logic
arXiv:2602.18607v1 Announce Type: new
Abstract: In CAS adaptation, a challenge is to define the dynamic architecture of the system and changes in its behavior. Implementation-wise, this is projected into an adaptation mechanism, typically realized as an Adaptation Manager (AM). With the advances of...
Decoding ML Decision: An Agentic Reasoning Framework for Large-Scale Ranking System
arXiv:2602.18640v1 Announce Type: new
Abstract: Modern large-scale ranking systems operate within a sophisticated landscape of competing objectives, operational constraints, and evolving product requirements. Progress in this domain is increasingly bottlenecked by the engineering context constraint...
arXiv:2602.18671v1 Announce Type: new
Abstract: We reinterpret the final Large Language Model (LLM) softmax classifier as an Energy-Based Model (EBM), decomposing the sequence-to-sequence probability chain into multiple interacting EBMs at inference. This principled approach allows us to track "ene...
The Potential of CoT for Reasoning: A Closer Look at Trace Dynamics
Chain-of-thought (CoT) prompting is a de-facto standard technique to elicit reasoning-like responses from large language models (LLMs), allowing them to spell out individual steps before giving a final answer. While the resemblance to human-like reasoning is undeniable, the driving forces underpinni...
Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining
One of the first pre-processing steps for constructing web-scale LLM pretraining datasets involves extracting text from HTML. Despite the immense diversity of web content, existing open-source datasets predominantly apply a single fixed extractor to all webpages. In this work, we investigate whether...
AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Recent multimodal large language models (MLLMs) such as GPT-4o and Qwen3-Omni show strong perception but struggle in multi-speaker, dialogue-centric settings that demand agentic reasoning tracking who speaks, maintaining roles, and grounding events across time. These scenarios are central to multimo...
Reducing Text Bias in Synthetically Generated MCQAs for VLMs in Autonomous Driving
arXiv:2602.17677v1 Announce Type: new
Abstract: Multiple Choice Question Answering (MCQA) benchmarks are an established standard for measuring Vision Language Model (VLM) performance in driving tasks. However, we observe the known phenomenon that synthetically generated MCQAs are highly susceptible...
Joint Parameter and State-Space Bayesian Optimization: Using Process Expertise to Accelerate Manufacturing Optimization
arXiv:2602.17679v1 Announce Type: new
Abstract: Bayesian optimization (BO) is a powerful method for optimizing black-box manufacturing processes, but its performance is often limited when dealing with high-dimensional multi-stage systems, where we can observe intermediate outputs. Standard BO model...
BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs
arXiv:2602.17680v1 Announce Type: new
Abstract: Existing Protein Language Models (PLMs) often suffer from limited adaptability to multiple tasks and exhibit poor generalization across diverse biological contexts. In contrast, general-purpose Large Language Models (LLMs) lack the capability to inter...
LATMiX: Learnable Affine Transformations for Microscaling Quantization of LLMs
arXiv:2602.17681v1 Announce Type: new
Abstract: Post-training quantization (PTQ) is a widely used approach for reducing the memory and compute costs of large language models (LLMs). Recent studies have shown that applying invertible transformations to activations can significantly improve quantizat...
Duality Models: An Embarrassingly Simple One-step Generation Paradigm
arXiv:2602.17682v1 Announce Type: new
Abstract: Consistency-based generative models like Shortcut and MeanFlow achieve impressive results via a target-aware design for solving the Probability Flow ODE (PF-ODE). Typically, such methods introduce a target time $r$ alongside the current time $t$ to mo...
Epistemic Traps: Rational Misalignment Driven by Model Misspecification
arXiv:2602.17676v1 Announce Type: new
Abstract: The rapid deployment of Large Language Models and AI agents across critical societal and technical domains is hindered by persistent behavioral pathologies including sycophancy, hallucination, and strategic deception that resist mitigation via reinfor...
The Token Games: Evaluating Language Model Reasoning with Puzzle Duels
arXiv:2602.17831v1 Announce Type: new
Abstract: Evaluating the reasoning capabilities of Large Language Models is increasingly challenging as models improve. Human curation of hard questions is highly expensive, especially in recent benchmarks using PhD-level domain knowledge to challenge the most ...
El Agente Gr\'afico: Structured Execution Graphs for Scientific Agents
arXiv:2602.17902v1 Announce Type: new
Abstract: Large language models (LLMs) are increasingly used to automate scientific workflows, yet their integration with heterogeneous computational tools remains ad hoc and fragile. Current agentic approaches often rely on unstructured text to manage context ...
Alignment in Time: Peak-Aware Orchestration for Long-Horizon Agentic Systems
arXiv:2602.17910v1 Announce Type: new
Abstract: Traditional AI alignment primarily focuses on individual model outputs; however, autonomous agents in long-horizon workflows require sustained reliability across entire interaction trajectories. We introduce APEMO (Affect-aware Peak-End Modulation for...
The growing size of Large Language Models (LLMs) makes efficient inference challenging, primarily due to the memory demands of the autoregressive Key-Value (KV) cache. Existing eviction or compression methods reduce cost but rely on heuristics, such as recency or past attention scores, which serve o...
Quantum computer breakthrough tracks qubit fluctuations in real time
Qubits, the heart of quantum computers, can change performance in fractions of a second — but until now, scientists couldn’t see it happening. Researchers at NBI have built a real-time monitoring system that tracks these rapid fluctuations about 100 times faster than previous methods. Using fast FPG...
Apple is (no so quietly) anchoring a vibrant ecosystem of talent, startups, and human-centered innovation. Here’s how its expansion is shaping the next chapter of AI in Austin, Texas.
A Few-Shot LLM Framework for Extreme Day Classification in Electricity Markets
arXiv:2602.16735v1 Announce Type: new
Abstract: This paper proposes a few-shot classification framework based on Large Language Models (LLMs) to predict whether the next day will have spikes in real-time electricity prices. The approach aggregates system state information, including electricity dem...