Reinforcement learning for inverse structural design and rapid laser cutting of kirigami prototypes
arXiv:2605.08098v1 Announce Type: new
Abstract: Kirigami is an increasingly useful fabrication method to produce shape-programmable metamaterial structures. However, inverse design remains difficult because deployment is nonlinear, and feasible cut layouts must satisfy discrete compatibility rules,...
Hidden Coalitions in Multi-Agent AI: A Spectral Diagnostic from Internal Representations
arXiv:2605.06696v1 Announce Type: new
Abstract: Collections of interacting AI agents can form coalitions, creating emergent group-level organization that is critical for AI safety and alignment. However, observing agent behavior alone is often insufficient to distinguish genuine informational coupl...
State Representation and Termination for Recursive Reasoning Systems
arXiv:2605.06690v1 Announce Type: new
Abstract: Recursive reasoning systems alternate between acquiring new evidence and refining an accumulated understanding. Two design choices are typically left implicit: how to represent the evolving reasoning state, and when to stop iterating. This paper addre...
Fast and Effective Redistricting Optimization via Composite-Move Tabu Search
arXiv:2605.06682v1 Announce Type: new
Abstract: Spatial redistricting is a practical combinatorial optimization problem that demands high-quality solutions, rapid turnaround, and flexibility to accommodate multi-criteria objectives and interactive refinement. A central challenge is the contiguity c...
More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models
arXiv:2605.06672v1 Announce Type: new
Abstract: Chain-of-thought (CoT) reasoning and reasoning-tuned models such as DeepSeek-R1 are commonly assumed to reduce shallow heuristic biases by thinking carefully. We test this on position bias in multiple-choice QA and find a different story: within any r...
GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning
arXiv:2605.06671v1 Announce Type: new
Abstract: Large Language Models (LLMs) have demonstrated strong potential for many mathematical problems. However, their performance on graph algorithmic tasks is still unsatisfying, since graphs are naturally more complex in topology and often require systemat...
Breaking the Illusion: When Positive Meets Negative in Multimodal Decoding
arXiv:2605.06679v1 Announce Type: new
Abstract: Vision-Language Models (VLMs) are frequently undermined by object hallucination, generating content that contradicts visual reality, due to an over-reliance on linguistic priors. We introduce Positive-and-Negative Decoding (PND), a training-free infer...
A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence
arXiv:2605.06678v1 Announce Type: new
Abstract: According to the United Nations Office for Disaster Risk Reduction (2025), the average annual cost of natural catastrophes increased from 70--80 billion USD between 1970 and 2000 to 180--200 billion USD between 2001 and 2020. Reports from organization...
LKV: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache Eviction
arXiv:2605.06676v1 Announce Type: new
Abstract: Long-context inference in Large Language Models (LLMs) is bottlenecked by the linear growth of Key-Value (KV) cache memory. Existing KV cache compression paradigms are fundamentally limited by heuristics: heuristic budgeting relies on statistical prio...
RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory
arXiv:2605.06675v1 Announce Type: new
Abstract: Large language models cache all previously computed key-value (KV) pairs during generation, and this KV cache grows linearly with sequence length, making it a primary memory bottleneck for serving. Quantizing the KV cache to fewer bits reduces this co...
BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning
Image captioning is one of the most fundamental tasks in computer vision. Owing to its open-ended nature, it has received significant attention in the era of multimodal large language models (MLLMs). In pursuit of ever more detailed and accurate captions, recent work has increasingly turned to reinf...
Understanding Annotator Safety Policy with Interpretability
arXiv:2605.05329v1 Announce Type: new
Abstract: Safety policies define what constitutes safe and unsafe AI outputs, guiding data annotation and model development. However, annotation disagreement is pervasive and can stem from multiple sources such as operational failures (annotators misunderstand ...
Partial Evidence Bench: Benchmarking Authorization-Limited Evidence in Agentic Systems
arXiv:2605.05379v1 Announce Type: new
Abstract: Enterprise agents increasingly operate inside scoped retrieval systems, delegated workflows, and policy-constrained evidence environments. In these settings, access control can be enforced correctly while the system still produces an answer that appea...
Intelligent CCTV for Urban Design: AI-Based Analysis of Soft Infrastructure at Intersections
arXiv:2605.05402v1 Announce Type: new
Abstract: Artificial intelligence (AI) and computer vision are transforming transportation data collection. This study introduces an AI-enabled analytics framework leveraging existing CCTV infrastructure to evaluate the impact of soft interventions, such as tem...
BALAR : A Bayesian Agentic Loop for Active Reasoning
arXiv:2605.05386v1 Announce Type: new
Abstract: Large language models increasingly operate in interactive settings where solving a task requires multiple rounds of information exchange with a user. However, most current systems treat dialogue reactively and lack a principled mechanism to reason abo...
Building realistic electric transmission grid dataset at scale: a pipeline from open dataset
Microsoft Research is excited to release an open dataset of approximate transmission topology of the U.S. power grid derived from publicly available data. The ability to study transmission-level power grid behavior is essential for modern power systems research. Analyses of congestion, transmission ...
Becoming AI ready: Building a company with 12 AI agents as my first hires
I left Google ten days ago to found my own company. It's been quite a journey figuring out how things work outside of the mothership, and I'm genuinely excited to share what I've learned from both sides of the house...
7 signs your AI agent system needs to start building its own tools
Most AI agents are stuck in their ways. Built once, they repeat the same patterns regardless of the task at hand. But new research suggests a smarter path forward: agents that get sharper with every challenge they face...
Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling
Overview of adaptive parallel reasoning.
What if a reasoning model could decide for itself when to decompose and parallelize independent subtasks, how many concurrent threads to spawn, and how to coordinate them based on the problem at hand? We provide a detailed analysis of recent progress in the...
Physics-Informed Neural Networks with Learnable Loss Balancing and Transfer Learning
arXiv:2605.05217v1 Announce Type: new
Abstract: We propose a self-supervised physics-informed neural network (PINN) framework that adaptively balances physics-based and data-driven supervision for scientific machine learning under data scarcity. Unlike prior PINNs that rely on fixed or heuristic we...
SAT: Sequential Agent Tuning for Coordinator Free Plug and Play Multi-LLM Training with Monotonic Improvement Guarantees
arXiv:2605.05216v1 Announce Type: new
Abstract: Large language models (LLMs) with a large number of parameters achieve strong performance but are often prohibitively expensive to deploy. Recent work explores using teams of smaller, more efficient LLMs that collectively match or even outperform a si...
Nationwide EHR-Based Chronic Rhinosinusitis Prediction Using Demographic-Stratified Models
arXiv:2605.05213v1 Announce Type: new
Abstract: Chronic rhinosinusitis (CRS) is a common heterogeneous inflammatory disorder that causes substantial morbidity and healthcare costs. CRS is difficult to identify early from routine encounters, as symptom presentations overlap with common conditions su...
Velox: Learning Representations of 4D Geometry and Appearance
We introduce a framework for learning latent representations of 4D objects which are descriptive, faithfully capturing object geometry and appearance; compressive, aiding in downstream efficiency; and accessible, requiring minimal input, i.e., an unstructured dynamic point cloud, to construct. Speci...
RVPO: Risk-Sensitive Alignment via Variance Regularization
Current critic-less RLHF methods aggregate multi-objective rewards via an arithmetic mean, leaving them vulnerable to constraint neglect: high-magnitude success in one objective can numerically offset critical failures in others (e.g., safety or formatting), masking low-performing “bottleneck” rewar...