The Spectral Lifecycle of Transformer Training: Transient Compression Waves, Persistent Spectral Gradients, and the Q/K--V Asymmetry
arXiv:2604.22778v1 Announce Type: new
Abstract: We present the first systematic study of weight matrix singular value spectra \emph{during} transformer pretraining, tracking full SVD decompositions of every weight matrix at 25-step intervals across three model scales (30M--285M parameters). We disc...
arXiv:2604.22782v1 Announce Type: new
Abstract: Serving transformer language models with high throughput requires caching Key-Values (KVs) to avoid redundant computation during autoregressive generation. The memory footprint of KV caching is significant and heavily impacts serving costs. This work ...
BiTA: Bidirectional Gated Recurrent Unit-Transformer Aggregator in a Temporal Graph Network Framework for Alert Prediction in Computer Networks
arXiv:2604.22781v1 Announce Type: new
Abstract: Proactive alert prediction in computer networks is critical for mitigating evolving cyber threats and enabling timely defensive actions. Temporal Graph Neural Networks (TGNs) provide a principled framework for modeling time-evolving interactions; howe...
Parameter Efficiency Is Not Memory Efficiency: Rethinking Fine-Tuning for On-Device LLM Adaptation
arXiv:2604.22783v1 Announce Type: new
Abstract: Parameter-Efficient Fine-Tuning (PEFT) has become the standard for adapting large language models (LLMs). In this work we challenge the wide-spread assumption that parameter efficiency equates memory efficiency and on-device adaptability. We show that...
KARL: Mitigating Hallucinations in LLMs via Knowledge-Boundary-Aware Reinforcement Learning
arXiv:2604.22779v1 Announce Type: new
Abstract: Enabling large language models (LLMs) to appropriately abstain from answering questions beyond their knowledge is crucial for mitigating hallucinations. While existing reinforcement learning methods foster autonomous abstention, they often compromise ...
Read the Paper, Write the Code: Agentic Reproduction of Social-Science Results
arXiv:2604.21965v1 Announce Type: new
Abstract: Recent work has used LLM agents to reproduce empirical social science results with access to both the data and code. We broaden this scope by asking: Can they reproduce results given only a paper's methods description and original data? We develop an ...
Math Takes Two: A test for emergent mathematical reasoning in communication
arXiv:2604.21935v1 Announce Type: new
Abstract: Although language models demonstrate remarkable proficiency on mathematical benchmarks, it remains unclear whether this reflects true mathematical reasoning or statistical pattern matching over learning formal syntax. Most existing evaluations rely on...
MolClaw: An Autonomous Agent with Hierarchical Skills for Drug Molecule Evaluation, Screening, and Optimization
arXiv:2604.21937v1 Announce Type: new
Abstract: Computational drug discovery, particularly the complex workflows of drug molecule screening and optimization, requires orchestrating dozens of specialized tools in multi-step workflows, yet current AI agents struggle to maintain robust performance and...
An Artifact-based Agent Framework for Adaptive and Reproducible Medical Image Processing
arXiv:2604.21936v1 Announce Type: new
Abstract: Medical imaging research is increasingly shifting from controlled benchmark evaluation toward real-world clinical deployment. In such settings, applying analytical methods extends beyond model design to require dataset-aware workflow configuration and...
Rethinking Publication: A Certification Framework for AI-Enabled Research
arXiv:2604.22026v1 Announce Type: new
Abstract: AI research pipelines now produce a growing share of publishable academic output, including work that meets existing peer-review standards for quality and novelty. Yet the publication system was built on the assumption of universal human authorship an...
When Quotes Crumble: Detecting Transient Mechanical Liquidity Erosion in Limit Order Books
arXiv:2604.21993v1 Announce Type: new
Abstract: We study the detection of transient liquidity erosion ("crumbling quotes") in electronic limit order books, where observable quote deterioration may reflect either mechanical liquidity withdrawal or informational repricing. Using the ABIDES agent-base...
Focus Session: Hardware and Software Techniques for Accelerating Multimodal Foundation Models
arXiv:2604.21952v1 Announce Type: new
Abstract: This work presents a multi-layered methodology for efficiently accelerating multimodal foundation models (MFMs). It combines hardware and software co-design of transformer blocks with an optimization pipeline that reduces computational and memory requ...
Conditional anomaly detection using soft harmonic functions: An application to clinical alerting
arXiv:2604.21956v1 Announce Type: new
Abstract: Timely detection of concerning events is an important problem in clinical practice. In this paper, we consider the problem of conditional anomaly detection that aims to identify data instances with an unusual response, such as the omission of an impor...
arXiv:2604.21003v1 Announce Type: new
Abstract: AI agents are increasingly deployed on complex, domain-specific workflows -- navigating enterprise web applications that require dozens of clicks and form fills, orchestrating multi-step research pipelines that span search, extraction, and synthesis, ...
Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI
arXiv:2604.20972v1 Announce Type: new
Abstract: Content moderation systems are typically evaluated by measuring agreement with human labels. In rule-governed environments this assumption fails: multiple decisions may be logically consistent with the governing policy, and agreement metrics penalize ...
Value-Conflict Diagnostics Reveal Widespread Alignment Faking in Language Models
arXiv:2604.20995v1 Announce Type: new
Abstract: Alignment faking, where a model behaves aligned with developer policy when monitored but reverts to its own preferences when unobserved, is a concerning yet poorly understood phenomenon, in part because current diagnostic tools remain limited. Prior d...
Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks
arXiv:2604.20987v1 Announce Type: new
Abstract: Long horizon interactive environments are a testbed for evaluating agents skill usage abilities. These environments demand multi step reasoning, the chaining of multiple skills over many timesteps, and robust decision making under delayed rewards and ...
AI to Learn 2.0: A Deliverable-Oriented Governance Framework and Maturity Rubric for Opaque AI in Learning-Intensive Domains
arXiv:2604.19751v1 Announce Type: new
Abstract: Generative AI is entering research, education, and professional work faster than current governance frameworks can specify how AI-assisted outputs should be judged in learning-intensive settings. The central problem is proxy failure: a polished artifa...
Exploring Data Augmentation and Resampling Strategies for Transformer-Based Models to Address Class Imbalance in AI Scoring of Scientific Explanations in NGSS Classroom
arXiv:2604.19754v1 Announce Type: new
Abstract: Automated scoring of students' scientific explanations offers the potential for immediate, accurate feedback, yet class imbalance in rubric categories particularly those capturing advanced reasoning remains a challenge. This study investigates augment...
Explainable AML Triage with LLMs: Evidence Retrieval and Counterfactual Checks
arXiv:2604.19755v1 Announce Type: new
Abstract: Anti-money laundering (AML) transaction monitoring generates large volumes of alerts that must be rapidly triaged by investigators under strict audit and governance constraints. While large language models (LLMs) can summarize heterogeneous evidence a...
Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts
arXiv:2604.19835v1 Announce Type: new
Abstract: Mixture-of-Experts (MoE) has become the dominant architecture for scaling large language models: frontier models routinely decouple total parameters from per-token computation through sparse expert routing. Scaling laws show that under fixed active co...
WorkflowGen:an adaptive workflow generation mechanism driven by trajectory experience
arXiv:2604.19756v1 Announce Type: new
Abstract: Large language model (LLM) agents often suffer from high reasoning overhead, excessive token consumption, unstable execution, and inability to reuse past experiences in complex tasks like business queries, tool use, and workflow orchestration. Traditi...
Transparent Screening for LLM Inference and Training Impacts
arXiv:2604.19757v1 Announce Type: new
Abstract: This paper presents a transparent screening framework for estimating inference and training impacts of current large language models under limited observability. The framework converts natural-language application descriptions into bounded environment...
Easy Samples Are All You Need: Self-Evolving LLMs via Data-Efficient Reinforcement Learning
arXiv:2604.18639v1 Announce Type: new
Abstract: Previous LLMs-based RL studies typically follow either supervised learning with high annotation costs, or unsupervised paradigms using voting or entropy-based rewards. However, their performance remains far from satisfactory due to the substantial ann...