Narrative-Driven Paper-to-Slide Generation via ArcDeck
arXiv:2604.11969v1 Announce Type: new
Abstract: We introduce ArcDeck, a multi-agent framework that formulates paper-to-slide generation as a structured narrative reconstruction task. Unlike existing methods that directly summarize raw text into slides, ArcDeck explicitly models the source paper's l...
arXiv:2604.09560v1 Announce Type: new
Abstract: Transformers, diffusion-maps, and magnetic Laplacians are usually treated as separate tools; we show they are all different regimes of a single Markov geometry built from pre-softmax query-scores. We define a QK "bidivergence" whose exponentiated and ...
Fairboard: a quantitative framework for equity assessment of healthcare models
arXiv:2604.09656v1 Announce Type: new
Abstract: Despite there now being more than 1,000 FDA-authorised AI medical devices, formal equity assessments -- whether model performance is uniform across patient subgroups -- are rare. Here, we evaluate the equity of 18 open-source brain tumour segmentation...
Deliberative Alignment is Deep, but Uncertainty Remains: Inference time safety improvement in reasoning via attribution of unsafe behavior to base model
arXiv:2604.09665v1 Announce Type: new
Abstract: While the wide adoption of refusal training in large language models (LLMs) has showcased improvements in model safety, recent works have highlighted shortcomings due to the shallow nature of these alignment methods. To this end, the work on Deliberat...
Human-like Working Memory Interference in Large Language Models
arXiv:2604.09670v1 Announce Type: new
Abstract: Intelligent systems must maintain and manipulate task-relevant information online to adapt to dynamic environments and changing goals. This capacity, known as working memory, is fundamental to human reasoning and intelligence. Despite having on the or...
Belief-State RWKV for Reinforcement Learning under Partial Observability
arXiv:2604.09671v1 Announce Type: new
Abstract: We propose a stronger formulation of RL on top of RWKV-style recurrent sequence models, in which the fixed-size recurrent state is explicitly interpreted as a belief state rather than an opaque hidden vector. Instead of conditioning policy and value o...
LABBench2: An Improved Benchmark for AI Systems Performing Biology Research
arXiv:2604.09554v1 Announce Type: new
Abstract: Optimism for accelerating scientific discovery with AI continues to grow. Current applications of AI in scientific research range from training dedicated foundation models on scientific data to agentic autonomous hypothesis generation systems to AI-dr...
arXiv:2604.09563v1 Announce Type: new
Abstract: AI systems produce large volumes of logs as they interact with tools and users. Analysing these logs can help understand model capabilities, propensities, and behaviours, or assess whether an evaluation worked as intended. Researchers have started dev...
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization
arXiv:2604.09574v1 Announce Type: new
Abstract: The rise of autonomous GUI agents has triggered adversarial countermeasures from digital platforms, yet existing research prioritizes utility and robustness over the critical dimension of anti-detection. We argue that for agents to survive in human-ce...
AHC: Meta-Learned Adaptive Compression for Continual Object Detection on Memory-Constrained Microcontrollers
arXiv:2604.09576v1 Announce Type: new
Abstract: Deploying continual object detection on microcontrollers (MCUs) with under 100KB memory requires efficient feature compression that can adapt to evolving task distributions. Existing approaches rely on fixed compression strategies (e.g., FiLM conditio...
“Giant superatoms” could finally solve quantum computing’s biggest problem
In the pursuit of powerful and stable quantum computers, researchers at Chalmers University of Technology, Sweden, have developed the theory for an entirely new quantum system – based on the novel concept of ‘giant superatoms’. This breakthrough enables quantum information to be protected, controlle...
GNN-as-Judge: Unleashing the Power of LLMs for Graph Learning with GNN Feedback
arXiv:2604.08553v1 Announce Type: new
Abstract: Large Language Models (LLMs) have shown strong performance on text-attributed graphs (TAGs) due to their superior semantic understanding ability on textual node features. However, their effectiveness as predictors in the low-resource setting, where la...
QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation
arXiv:2604.08570v1 Announce Type: new
Abstract: Large Language Models (LLMs) are increasingly used for code generation, yet quantum code generation is still evaluated mostly within single frameworks, making it difficult to separate quantum reasoning from framework familiarity. We introduce QuanBenc...
arXiv:2604.08571v1 Announce Type: new
Abstract: While Large Language Models (LLMs) achieve high performance on standard mathematical benchmarks, their underlying reasoning processes remain highly overfit to standard textual formatting. We propose a perturbation pipeline consisting of 14 techniques ...
Ranked Activation Shift for Post-Hoc Out-of-Distribution Detection
arXiv:2604.08572v1 Announce Type: new
Abstract: State-of-the-art post-hoc out-of-distribution detection methods rely on intermediate layer activation editing. However, they exhibit inconsistent performance across datasets and models. We show that this instability is driven by differences in the act...
OpenKedge: Governing Agentic Mutation with Execution-Bound Safety and Evidence Chains
arXiv:2604.08601v1 Announce Type: new
Abstract: The rise of autonomous AI agents exposes a fundamental flaw in API-centric architectures: probabilistic systems directly execute state mutations without sufficient context, coordination, or safety guarantees. We introduce OpenKedge, a protocol that re...
From Business Events to Auditable Decisions: Ontology-Governed Graph Simulation for Enterprise AI
arXiv:2604.08603v1 Announce Type: new
Abstract: Existing LLM-based agent systems share a common architectural failure: they answer from the unrestricted knowledge space without first simulating how active business scenarios reshape that space for the event at hand -- producing decisions that are fl...
RAMP: Hybrid DRL for Online Learning of Numeric Action Models
arXiv:2604.08685v1 Announce Type: new
Abstract: Automated planning algorithms require an action model specifying the preconditions and effects of each action, but obtaining such a model is often hard. Learning action models from observations is feasible, but existing algorithms for numeric domains ...
Parameterized Complexity Of Representing Models Of MSO Formulas
arXiv:2604.08707v1 Announce Type: new
Abstract: Monadic second order logic (MSO2) plays an important role in parameterized complexity due to the Courcelle's theorem. This theorem states that the problem of checking if a given graph has a property specified by a given MSO2 formula can be solved by a...
Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts
This paper was accepted at the Workshop on Navigating and Addressing Data Problems for Foundation Models at ICLR 2026.
Large language models (LLMs) can struggle to memorize factual knowledge in their parameters, often leading to hallucinations and poor performance on knowledge-intensive tasks. In th...