Abstraction Generation for Generalized Planning with Pretrained Large Language Models
arXiv:2602.10485v1 Announce Type: new
Abstract: Qualitative Numerical Planning (QNP) serves as an important abstraction model for generalized planning (GP), which aims to compute general plans that solve multiple instances at once. Recent works show that large language models (LLMs) can function as...
MERIT Feedback Elicits Better Bargaining in LLM Negotiators
arXiv:2602.10467v1 Announce Type: new
Abstract: Bargaining is often regarded as a logical arena rather than an art or a matter of intuition, yet Large Language Models (LLMs) still struggle to navigate it due to limited strategic depth and difficulty adapting to complex human factors. Current benchm...
Found-RL: foundation model-enhanced reinforcement learning for autonomous driving
arXiv:2602.10458v1 Announce Type: new
Abstract: Reinforcement Learning (RL) has emerged as a dominant paradigm for end-to-end autonomous driving (AD). However, RL suffers from sample inefficiency and a lack of semantic interpretability in complex scenarios. Foundation Models, particularly Vision-La...
LiveMedBench: A Contamination-Free Medical Benchmark for LLMs with Automated Rubric Evaluation
arXiv:2602.10367v1 Announce Type: new
Abstract: The deployment of Large Language Models (LLMs) in high-stakes clinical settings demands rigorous and reliable evaluation. However, existing medical benchmarks remain static, suffering from two critical limitations: (1) data contamination, where test s...
Discovering Differences in Strategic Behavior Between Humans and LLMs
arXiv:2602.10324v1 Announce Type: new
Abstract: As Large Language Models (LLMs) are increasingly deployed in social and strategic scenarios, it becomes critical to understand where and why their behavior diverges from that of humans. While behavioral game theory (BGT) provides a framework for analy...
Signature-Kernel Based Evaluation Metrics for Robust Probabilistic and Tail-Event Forecasting
arXiv:2602.10182v1 Announce Type: new
Abstract: Probabilistic forecasting is increasingly critical across high-stakes domains, from finance and epidemiology to climate science. However, current evaluation frameworks lack a consensus metric and suffer from two critical flaws: they often assume indep...
arXiv:2602.10177v1 Announce Type: new
Abstract: Recent advances in foundational models have yielded reasoning systems capable of achieving a gold-medal standard at the International Mathematical Olympiad. The transition from competition-level problem-solving to professional research, however, requi...
Large Language Models Predict Functional Outcomes after Acute Ischemic Stroke
arXiv:2602.10119v1 Announce Type: new
Abstract: Accurate prediction of functional outcomes after acute ischemic stroke can inform clinical decision-making and resource allocation. Prior work on modified Rankin Scale (mRS) prediction has relied primarily on structured variables (e.g., age, NIHSS) an...
Trace Length is a Simple Uncertainty Signal in Reasoning Models
Uncertainty quantification for LLMs is a key research direction towards addressing hallucination and other issues that limit their reliable deployment. In this work, we show that reasoning trace length is a simple and useful confidence estimator in large reasoning models. Through comprehensive exper...
Mapping the Design Space of User Experience for Computer Use Agents
Large language model (LLM)-based computer use agents execute user commands by interacting with available UI elements, but little is known about how users want to interact with these agents or what design factors matter for their user experience (UX). We conducted a two-phase study to map the UX desi...
Spectral Disentanglement and Enhancement: A Dual-domain Contrastive Framework for Representation Learning
arXiv:2602.09066v1 Announce Type: new
Abstract: Large-scale multimodal contrastive learning has recently achieved impressive success in learning rich and transferable representations, yet it remains fundamentally limited by the uniform treatment of feature dimensions and the neglect of the intrinsi...
Enhanced Graph Transformer with Serialized Graph Tokens
arXiv:2602.09065v1 Announce Type: new
Abstract: Transformers have demonstrated success in graph learning, particularly for node-level tasks. However, existing methods encounter an information bottleneck when generating graph-level representations. The prevalent single token paradigm fails to fully ...
Learning to Remember, Learn, and Forget in Attention-Based Models
arXiv:2602.09075v1 Announce Type: new
Abstract: In-Context Learning (ICL) in transformers acts as an online associative memory and is believed to underpin their high performance on complex sequence processing tasks. However, in gated linear attention models, this memory has a fixed capacity and is ...
Patient foundation model for risk stratification in low-risk overweight patients
arXiv:2602.09079v1 Announce Type: new
Abstract: Accurate risk stratification in patients with overweight or obesity is critical for guiding preventive care and allocating high-cost therapies such as GLP-1 receptor agonists. We present PatientTPP, a neural temporal point process (TPP) model trained ...
Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models
arXiv:2602.09080v1 Announce Type: new
Abstract: Large Multimodal Models (LMMs) have achieved remarkable success in vision-language tasks, yet their vast parameter counts are often underutilized during both training and inference. In this work, we embrace the idea of looping back to move forward: re...
Uncertainty-Aware Multimodal Emotion Recognition through Dirichlet Parameterization
arXiv:2602.09121v1 Announce Type: new
Abstract: In this work, we present a lightweight and privacy-preserving Multimodal Emotion Recognition (MER) framework designed for deployment on edge devices. To demonstrate framework's versatility, our implementation uses three modalities - speech, text and f...
PABU: Progress-Aware Belief Update for Efficient LLM Agents
arXiv:2602.09138v1 Announce Type: new
Abstract: Large Language Model (LLM) agents commonly condition actions on full action-observation histories, which introduce task-irrelevant information that easily leads to redundant actions and higher inference cost. We propose Progress-Aware Belief Update (P...
CoMMa: Contribution-Aware Medical Multi-Agents From A Game-Theoretic Perspective
arXiv:2602.09159v1 Announce Type: new
Abstract: Recent multi-agent frameworks have broadened the ability to tackle oncology decision support tasks that require reasoning over dynamic, heterogeneous patient data. We propose Contribution-Aware Medical Multi-Agents (CoMMa), a decentralized LLM-agent f...
A Small-Scale System for Autoregressive Program Synthesis Enabling Controlled Experimentation
arXiv:2602.09112v1 Announce Type: new
Abstract: What research can be pursued with small models trained to complete true programs? Typically, researchers study program synthesis via large language models (LLMs) which introduce issues such as knowing what is in or out of distribution, understanding f...
FlyAOC: Evaluating Agentic Ontology Curation of Drosophila Scientific Knowledge Bases
arXiv:2602.09163v1 Announce Type: new
Abstract: Scientific knowledge bases accelerate discovery by curating findings from primary literature into structured, queryable formats for both human researchers and emerging AI systems. Maintaining these resources requires expert curators to search relevant...
Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?
arXiv:2602.07055v1 Announce Type: new
Abstract: Spatial embodied intelligence requires agents to act to acquire information under partial observability. While multimodal foundation models excel at passive perception, their capacity for active, self-directed exploration remains understudied. We prop...
Aster: Autonomous Scientific Discovery over 20x Faster Than Existing Methods
arXiv:2602.07040v1 Announce Type: new
Abstract: We introduce Aster, an AI agent for autonomous scientific discovery capable of operating over 20 times faster than existing frameworks. Given a task, an initial program, and a script to evaluate the performance of the program, Aster iteratively improv...