Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey
arXiv:2605.27431v1 Announce Type: new
Abstract: Mixture-of-Experts (MoE) presents a naturally compatible and scalable framework for multimodal learning, demonstrating strong adaptability across diverse modalities and tasks. Despite its growing success, a comprehensive and systematic review on the M...
$E^3$-Agent: An Executable and Evolving Agent for Resource Management of Edge Generative Inference
arXiv:2605.27428v1 Announce Type: new
Abstract: Edge deployments of generative inference increasingly face two practical realities: per-device per-model performance is often unknown at deployment time, and it is non-stationary due to user-driven semantic events, background load, and device churn. C...
A Simple State Space Model Excels at Multivariate Time Series Classification
arXiv:2605.27406v1 Announce Type: new
Abstract: Structured state space models (SSMs) have recently emerged as a promising foundation for sequence modeling, with Mamba-based architectures demonstrating strong performance through input-dependent state transitions, albeit at considerable complexity. H...
IGADA-IoT: IoT Sensor Energy Optimization in Wireless Sensor Networks Driven by Automatic Data Augmentation
arXiv:2605.27397v1 Announce Type: new
Abstract: In wireless sensor networks (WSNs), data augmentation is a novel method to improve sampling-frequency decision performance, thereby enabling energy optimization for IoT (Internet of Things) sensors. However, existing methods rely on a single generator...
Personalized Observation Normalization for Federated Reinforcement Learning in Simulation Environments with Heterogeneity
arXiv:2605.27385v1 Announce Type: new
Abstract: Federated reinforcement learning (FedRL) enables multiple agents to collaboratively train a global policy without sharing raw data, making it ideal for privacy-sensitive applications. However, FedRL faces challenges in heterogeneous environments where...
DynaSchedBench: Calibrated Dynamic Scheduling Benchmarks and Observability Paradox in LLM-based Scheduling Agents
arXiv:2605.27566v1 Announce Type: new
Abstract: Progress in neural combinatorial optimization for Dynamic Flexible Job Shop Scheduling Problem (DFJSP) is currently hindered by a methodological tension: static benchmarks encourage benchmark overfitting, while uncalibrated generators obscure algorith...
Soro: A Lightweight Foundation Model and Chatbot for Tajik
arXiv:2605.27379v1 Announce Type: new
Abstract: We present Soro, a family of Tajik-specialized conversational large language models (LLMs) designed for real-world deployment under tight compute and connectivity constraints in Tajikistan. Starting from open-weight Gemma 3 checkpoints, we perform Taj...
Identifying and Understanding Human Values in Text: A Tailorable LLM-based Architecture
arXiv:2605.27373v1 Announce Type: new
Abstract: As intelligent systems become more autonomous, the scientific community focuses on creating decision-making mechanisms that include ethical and moral considerations, unlike traditional utility-maximisation models. To achieve this, a key aspect is asse...
Understanding AI as an extension of human intelligence—not a replacement for it—offers a more grounded path for building trustworthy AI systems.
The post Extending Human Intelligence Through AI appeared first on Microsoft Research.
arXiv:2605.26279v1 Announce Type: new
Abstract: Constraint Acquisition (CA) and related research on the validation and enhancement of Mathematical Programming (MP) models from domain knowledge artifacts are currently limited by inadequate benchmarks. This deficiency impedes reproducibility and cros...
Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions
arXiv:2605.26256v1 Announce Type: new
Abstract: Multimodal large language model (MLLM)-based embodied agents have shown strong potential for solving complex tasks in physical environments. However, personalized assistance requires more than following generic instruction or recognizing object catego...
Is Agent Memory a Database? Rethinking Data Foundations for Long-Term AI Agent Memory
arXiv:2605.26252v1 Announce Type: new
Abstract: Long-running AI agents need persistent memory. Memory supports learning across sessions, reduces repeated context injection, and enables auditing of past decisions. Current agent memory systems and database paradigms treat memory as storage. They loca...
arXiv:2605.26242v1 Announce Type: new
Abstract: Can large language models detect and report their own internal states? A number of studies have argued that the answer to this question is yes. We argue, based on lessons from human metacognition research, that this conclusion may be premature: to be ...
BrickAnything: Geometry-Conditioned Buildable Brick Generation with Structure-Aware Tokenization
arXiv:2605.26182v1 Announce Type: new
Abstract: Generating physically buildable brick structures from 3D shapes requires more than geometric reconstruction: the output must also satisfy discrete part constraints and structural stability. Existing brick generation methods either rely on heuristic op...
arXiv:2605.26147v1 Announce Type: new
Abstract: Human decision-making is sequential and uncertainty-aware, yet standard neural networks often rely on static, dense forward computation with limited visibility into evidence acquisition, uncertainty evolution, or when computation should stop. We intro...
AirCast-SR: A Foundation Model for Kilometer-Scale Atmospheric Super-Resolution via Latent Consistency Diffusion
arXiv:2605.26130v1 Announce Type: new
Abstract: Operational weather prediction at kilometer scales remains computationally prohibitive for traditional numerical weather prediction (NWP) models, limiting forecast access for applications in energy, agriculture, and disaster management that require fi...
The Constraint Tax: Measuring Validity-Correctness Tradeoffs in Structured Outputs for Small Language Models
arXiv:2605.26128v1 Announce Type: new
Abstract: Production LLM systems increasingly require machine-readable outputs: JSON objects, typed traces, regex-constrained fields, and tool-call schemas. This paper targets on-device and low-cost small language model (SLM) deployments, where sub-3B models ar...
GEM: Geometric Entropy Mixing for Optimal LLM Data Curation
arXiv:2605.26121v1 Announce Type: new
Abstract: LLM pre-training efficacy increasingly depends on data composition rather than sheer volume. Yet, optimal mixing is hindered by categorization flaws: human taxonomies suffer from ontological misalignment, and Euclidean clustering fails to address embe...
How AI is Transforming Scientific Discovery While Keeping Humans at the Center
From designing new antibodies to simulating 1,000 years of climate in a day, AI is transforming what's possible—but humans remain the ones deciding what matters.
Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs
arXiv:2605.23929v1 Announce Type: new
Abstract: Modern AI systems increasingly rely on workflows composed of multiple interacting agents, some powered by large language models (LLMs) and others by conventional computational modules. This paper analyzes the fundamental tradeoffs between latency, rel...
Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction
arXiv:2605.23928v1 Announce Type: new
Abstract: We present Context, the intelligence layer of the Magarshak Architecture, which replaces reactive query-response chatbots with proactive goal-directed agents that advance shared tasks without waiting for user prompts. The architecture rests on three m...
How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning
arXiv:2605.23926v1 Announce Type: new
Abstract: Reasoning-capable large language models solve hard problems by emitting long chains of thought, paying heavily in latency, GPU time, and energy. Casual inspection of their traces reveals extensive reformulation, verification, and circular self-reflect...
arXiv:2605.23909v1 Announce Type: new
Abstract: We investigate the calibration of large language models' (LLMs') confidence across diverse tasks. The results of our preregistered study show that the current crop of LLMs are, like people, too sure they are right: confidence exceeds accuracy, on aver...