Generative AI-assisted Participatory Modeling in Socio-Environmental Planning under Deep Uncertainty
arXiv:2603.17021v1 Announce Type: new
Abstract: Socio-environmental planning under deep uncertainty requires researchers to identify and conceptualize problems before exploring policies and deploying plans. In practice and model-based planning approaches, this problem conceptualization process ofte...
Cascade-Aware Multi-Agent Routing: Spatio-Temporal Sidecars and Geometry-Switching
arXiv:2603.17112v1 Announce Type: new
Abstract: A common architectural pattern in advanced AI reasoning systems is the symbolic graph network: specialized agents or modules connected by delegation edges, routing tasks through a dynamic execution graph. Current schedulers optimize load and fitness b...
How Clued up are LLMs? Evaluating Multi-Step Deductive Reasoning in a Text-Based Game Environment
arXiv:2603.17169v1 Announce Type: new
Abstract: Deducing whodunit proves challenging for LLM agents. In this paper, we implement a text-based multi-agent version of the classic board game Clue as a rule-based testbed for evaluating multi-step deductive reasoning, with six agents drawn from GPT-4o-m...
arXiv:2603.17216v1 Announce Type: new
Abstract: With the advent of AI agents, automatic scientific discovery has become a tenable goal. Many recent works scaffold agentic systems that can perform machine learning research, but don't offer a principled way to train such agents -- and current LLMs of...
From engagement to fulfillment: How Agentic AI is rewriting product metrics
As AI agents begin executing tasks on users’ behalf, traditional engagement metrics are becoming less meaningful. In the age of agentic AI, product teams may need a new north star: measuring whether user intent was successfully fulfilled.
When AI judges AI: The hidden dangers of reasoning models in alignment
The race to build more capable AI systems has created an unexpected problem:
As we push toward more sophisticated models, we need equally sophisticated ways to evaluate and align them.
Tokenization Tradeoffs in Structured EHR Foundation Models
arXiv:2603.15644v1 Announce Type: new
Abstract: Foundation models for structured electronic health records (EHRs) are pretrained on longitudinal sequences of timestamped clinical events to learn adaptable patient representations. Tokenization -- how these timelines are converted into discrete model...
Alternating Reinforcement Learning with Contextual Rubric Rewards
arXiv:2603.15646v1 Announce Type: new
Abstract: Reinforcement Learning with Rubric Rewards (RLRR) is a framework that extends conventional reinforcement learning from human feedback (RLHF) and verifiable rewards (RLVR) by replacing scalar preference signals with structured, multi-dimensional, conte...
Steering Frozen LLMs: Adaptive Social Alignment via Online Prompt Routing
arXiv:2603.15647v1 Announce Type: new
Abstract: Large language models (LLMs) are typically governed by post-training alignment (e.g., RLHF or DPO), which yields a largely static policy during deployment and inference. However, real-world safety is a full-lifecycle problem: static defenses degrade a...
Neural-Symbolic Logic Query Answering in Non-Euclidean Space
arXiv:2603.15633v1 Announce Type: new
Abstract: Answering complex first-order logic (FOL) queries on knowledge graphs is essential for reasoning. Symbolic methods offer interpretability but struggle with incomplete graphs, while neural approaches generalize better but lack transparency. Neural-symb...
NextMem: Towards Latent Factual Memory for LLM-based Agents
arXiv:2603.15634v1 Announce Type: new
Abstract: Memory is critical for LLM-based agents to preserve past observations for future decision-making, where factual memory serves as its foundational part. However, existing approaches to constructing factual memory face several limitations. Textual metho...
arXiv:2603.15636v1 Announce Type: new
Abstract: As AI-driven document understanding and processing tools become increasingly prevalent in real-world applications, the need for rigorous evaluation standards has grown increasingly urgent. Existing benchmarks and evaluations often focus on isolated ca...
The Comprehension-Gated Agent Economy: A Robustness-First Architecture for AI Economic Agency
arXiv:2603.15639v1 Announce Type: new
Abstract: AI agents are increasingly granted economic agency (executing trades, managing budgets, negotiating contracts, and spawning sub-agents), yet current frameworks gate this agency on capability benchmarks that are empirically uncorrelated with operationa...
arXiv:2603.15641v1 Announce Type: new
Abstract: Recursive reasoning models such as Hierarchical Reasoning Model (HRM) and Tiny Recursive Model (TRM) show that small, weight-shared networks can solve compute-heavy and NP puzzles by iteratively refining latent states, but their training typically rel...
Prose2Policy (P2P): A Practical LLM Pipeline for Translating Natural-Language Access Policies into Executable Rego
Prose2Policy (P2P) is a LLM-based practical tool that translates natural-language access control policies (NLACPs) into executable Rego code (the policy language of Open Policy Agent, OPA). It provides a modular, end-to-end pipeline that performs policy detection, component extraction, schema valida...
Goldilocks RL: Tuning Task Difficulty to Escape Sparse Rewards for Reasoning
Reinforcement learning has emerged as a powerful paradigm for unlocking reasoning capabilities in large language models. However, relying on sparse rewards makes this process highly sample-inefficient, as models must navigate vast search spaces with minimal feedback. While classic curriculum learnin...
Translational Gaps in Graph Transformers for Longitudinal EHR Prediction: A Critical Appraisal of GT-BEHRT
arXiv:2603.13231v1 Announce Type: new
Abstract: Transformer-based models have improved predictive modeling on longitudinal electronic health records through large-scale self-supervised pretraining. However, most EHR transformer architectures treat each clinical encounter as an unordered collection ...
Your Code Agent Can Grow Alongside You with Structured Memory
arXiv:2603.13258v1 Announce Type: new
Abstract: While "Intent-oriented programming" (or "Vibe Coding") redefines software engineering, existing code agents remain tethered to static code snapshots. Consequently, they struggle to model the critical information embedded in the temporal evolution of p...
A Dual-Path Generative Framework for Zero-Day Fraud Detection in Banking Systems
arXiv:2603.13237v1 Announce Type: new
Abstract: High-frequency banking environments face a critical trade-off between low-latency fraud detection and the regulatory explainability demanded by GDPR. Traditional rule-based and discriminative models struggle with "zero-day" attacks due to extreme clas...
Benchmarking Zero-Shot Reasoning Approaches for Error Detection in Solidity Smart Contracts
arXiv:2603.13239v1 Announce Type: new
Abstract: Smart contracts play a central role in blockchain systems by encoding financial and operational logic. Still, their susceptibility to subtle security flaws poses significant risks of financial loss and erosion of trust. LLMs create new opportunities f...
Think First, Diffuse Fast: Improving Diffusion Language Model Reasoning via Autoregressive Plan Conditioning
arXiv:2603.13243v1 Announce Type: new
Abstract: Diffusion large language models (dLLMs) generate text via iterative denoising but consistently underperform on multi-step reasoning. We hypothesize this gap stems from a coordination problem: AR models build coherence token-by-token, while diffusion m...
Automating Document Intelligence in Statutory City Planning
arXiv:2603.13245v1 Announce Type: new
Abstract: UK planning authorities face a legislative conflict between the Planning Act, which mandates public access to application documents, and the Data Protection Act, which requires protection of personal information. This situation creates a manually inte...