In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models
arXiv:2605.23908v1 Announce Type: new
Abstract: We are in the midst of large-scale industrial and academic efforts to automate the processes of scientific, technological and creative production through AI-driven assistants. Historically, a fundamental property of these processes in their human form...
Iterative Refinement Neural Operators are Learned Fixed-Point Solvers: A Principled Approach to Spectral Bias Mitigation
arXiv:2605.24041v1 Announce Type: new
Abstract: Neural operators serve as fast, data-driven surrogates for scientific modeling but typically rely on a monolithic, single-pass inference procedure that struggles to resolve high-frequency details, a limitation known as spectral bias. We introduce the ...
Towards Verifiable Transformers: Solver-Checkable Circuit Explanations
arXiv:2605.24033v1 Announce Type: new
Abstract: Mechanistic interpretability often identifies circuits inside Transformer models, but explanations of those circuits are usually validated through examples, ablations, and manual reasoning. This leaves a gap between finding a plausible circuit and pro...
arXiv:2605.23984v1 Announce Type: new
Abstract: Industrial anomaly detection has attracted significant attention as a fundamental challenge in industrial systems. The rapid advancement of heterogeneous industrial sensors has driven industrial anomaly detection from unimodal to multimodal paradigms....
Algometrics: Forecasting Under Algorithmic Feedback
arXiv:2605.23978v1 Announce Type: new
Abstract: In algorithmic markets, predictive models become part of the data-generating process they aim to forecast. Once their outputs are converted into trades, allocations, execution schedules, or risk controls, they change the future data on which they are ...
Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems
arXiv:2605.22883v1 Announce Type: new
Abstract: Current AI energy benchmarks measure consumption at the granularity of a single model invocation or training run. For classical single-turn workloads this unit remains coherent. For agentic systems - where a single user goal may trigger multi-step orc...
SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research
arXiv:2605.22878v1 Announce Type: new
Abstract: The exponential growth of global academic output has confronted researchers and AI agents with an unprecedented ``information explosion,'' where fragmented and unstructured knowledge organization impedes deep interdisciplinary integration. Current aca...
RMA: an Agentic System for Research-Level Mathematical Problems
arXiv:2605.22875v1 Announce Type: new
Abstract: We present $\textbf{Research Math Agents (RMA)}$, an agentic framework for automated reasoning on research-level mathematical problems. Unlike prior studies centered on competition mathematics or formal theorem proving, RMA targets research-level math...
NeuroNL2LTL: A Neurosymbolic Framework for Natural Language Translation of Linear Temporal Logic
arXiv:2605.22874v1 Announce Type: new
Abstract: Effectively translating between natural language (NL) and formal logics like Linear Temporal Logic (LTL) requires expertise that limits formal verification's reach in safety-critical development. Template-based approaches sacrifice expressiveness for ...
BOHM: Zero-Cost Hierarchical Attribution for Compound AI Systems
arXiv:2605.22866v1 Announce Type: new
Abstract: Compound AI systems route tasks through hierarchies of specialised components. Attribution is dominated by Shapley-based methods (SHAP), which decompose a coalition value function into per-component marginal contributions and require evaluation of the...
The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models
arXiv:2605.22870v1 Announce Type: new
Abstract: Chain-of-thought (CoT) prompting is necessary for arithmetic in small language models, yet shuffling its steps preserves most performance. What does CoT contribute if not logical sequencing? In three 1-3B instruction-tuned LMs on GSM8K, we isolate the...
FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral Preconditioning
arXiv:2605.22869v1 Announce Type: new
Abstract: Both full fine-tuning (Full FT) and parameter-efficient fine-tuning methods such as LoRA introduce weight updates without accounting for the spectral structure established during pretraining. As a result, noisy gradients from limited fine-tuning data ...
Reading Calibrated Uncertainty from Language Model Trajectories
arXiv:2605.22864v1 Announce Type: new
Abstract: The maximum softmax probability (MSP) represents a default approach when evaluating uncertainty quantification for language model generation with structured output. Although cheap, it is often miscalibrated. Methods that probe the model's internal act...
Latent Cache Flow: Model-to-Model Communication Without Text
arXiv:2605.22863v1 Announce Type: new
Abstract: LLM agents today communicate via text, which incurs considerable latency and information loss due to the need to autoregressively decode the sharer model's state and encode at the receiver model. Recent work such as Cache-to-Cache (C2C; Fu et al., 202...
AOP-Wiki EMOD 3.0: Data Model Expansions and Content Evaluation Framework for Using Agentic AI to Improve Integration between AOPs and New Approach Methodologies (NAMs)
arXiv:2605.21645v1 Announce Type: new
Abstract: Adverse Outcome Pathways (AOP) are logic models that causally link biological mechanisms that can be measured in a lab to adverse outcomes, relevant to chemical regulatory endpoints. AOPs contextualize new approach methodologies (NAMs), in vitro and i...
MindLoom: Composing Thought Modes for Frontier-Level Reasoning Data Synthesis
arXiv:2605.21630v1 Announce Type: new
Abstract: Although LLMs have made substantial progress in reasoning, systematically producing frontier-level reasoning data remains difficult. Existing synthesis methods often have limited visibility into the structural factors that govern problem difficulty, w...
The Shape of Testimony: A Scalable Framework for Oral History Archive Comparison
arXiv:2605.21623v1 Announce Type: new
Abstract: Researchers in Holocaust studies have often distinguished between two styles of oral survivor testimony: the USC Shoah Foundation's interviews tend to follow a structured, interviewer-guided format, whereas the Yale Fortunoff Video Archive generally f...
Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs
arXiv:2605.21602v1 Announce Type: new
Abstract: Many safety and alignment failures of large language models (LLMs) occur due to out-of-distribution (OOD) situations: unusual prompt or response patterns that are unforeseen by model developers. We systematically study whether LLM monitoring pipelines...
The Attribution Impossibility: No Feature Ranking Is Faithful, Stable, and Complete Under Collinearity
arXiv:2605.21492v1 Announce Type: new
Abstract: No feature ranking can be simultaneously faithful, stable, and complete when features are collinear. For collinear pairs, ranking reduces to a coin flip. We prove this impossibility, quantify it for four model classes, resolve it via ensemble averagin...
Teaching Language Models to Forecast Research Success Through Comparative Idea Evaluation
arXiv:2605.21491v1 Announce Type: new
Abstract: As language models accelerate scientific research by automating hypothesis generation and implementation, a new bottleneck emerges: evaluating and filtering hundreds of AI-generated ideas without exhaustive experimentation. We ask whether LMs can lear...
Temporal Contrastive Transformer for Financial Crime Detection: Self-Supervised Sequence Embeddings via Predictive Contrastive Coding
arXiv:2605.21490v1 Announce Type: new
Abstract: We introduce the Temporal Contrastive Transformer (TCT), a representation learning framework designed to capture contextual temporal dynamics in sequences of financial transactions. The model is trained using a self-supervised contrastive objective to...
MagenticLite, MagenticBrain, Fara1.5: An agentic experience optimized for small models
MagenticLite is an agentic system for small models that works across the browser and local file system in a single workflow. It combines specialized models and orchestration to support efficient agentic performance on everyday tasks.
The post MagenticLite, MagenticBrain, Fara1.5: An agentic experien...
AgentCo-op: Retrieval-Based Synthesis of Interoperable Multi-Agent Workflows
arXiv:2605.20425v1 Announce Type: new
Abstract: Designing multi-agent workflows is especially difficult in open-ended scientific settings where tasks lack curated training sets, reliable scalar evaluation metrics, and standardized interfaces between existing tools and agents. We propose AgentCo-op,...
Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration
arXiv:2605.20190v1 Announce Type: new
Abstract: Iterative industrial design-simulation optimization is bottlenecked by the CAD-CAE semantic gap: translating simulation feedback into valid geometric edits under diverse, coupled constraints. To fill this gap, we propose COSMO-Agent (Closed-loop Optim...