arXiv:2601.00823v1 Announce Type: new
Abstract: Large reasoning models (LRMs) have heterogeneous inference energy costs based on which model is used and how much it reasons. To reduce energy, it is important to choose the right LRM and operate it in the right way. As a result, the performance of sy...
Improving User Interface Generation Models from Designer Feedback
Despite being trained on vast amounts of data, most LLMs are unable to reliably generate well-designed UIs. Designer feedback is essential to improving performance on UI generation; however, we find that existing RLHF methods based on ratings or rankings are not well-aligned with designers’ workflow...
NarrativeTrack: Evaluating Video Language Models Beyond the Frame
Multimodal large language models (MLLMs) have achieved impressive progress in vision-language reasoning, yet their ability to understand temporally unfolding narratives in videos remains underexplored. True narrative understanding requires grounding who is doing what, when, and where, maintaining co...
The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition
arXiv:2601.00065v1 Announce Type: new
Abstract: The open-weight LLM ecosystem is increasingly defined by model composition techniques (such as weight merging, speculative decoding, and vocabulary expansion) that remix capabilities from diverse sources. A critical prerequisite for applying these met...
arXiv:2601.00084v1 Announce Type: new
Abstract: In fixed-confidence best arm identification (BAI), the objective is to quickly identify the optimal option while controlling the probability of error below a desired threshold. Despite the plethora of BAI algorithms, existing methods typically fall sh...
Reasoning in Action: MCTS-Driven Knowledge Retrieval for Large Language Models
arXiv:2601.00003v1 Announce Type: new
Abstract: Large language models (LLMs) typically enhance their performance through either the retrieval of semantically similar information or the improvement of their reasoning capabilities. However, a significant challenge remains in effectively integrating b...
arXiv:2601.00021v1 Announce Type: new
Abstract: We present a physical theory of intelligence grounded in irreversible information processing in systems constrained by conservation laws. An intelligent system is modelled as a coupled agent-environment process whose evolution transforms information i...
A multi-algorithm approach for operational human resources workload balancing in a last mile urban delivery system
arXiv:2601.00023v1 Announce Type: new
Abstract: Efficient workload assignment to the workforce is critical in last-mile package delivery systems. In this context, traditional methods of assigning package deliveries to workers based on geographical proximity can be inefficient and surely guide to an...
Quantitative Rule-Based Strategy modeling in Classic Indian Rummy: A Metric Optimization Approach
arXiv:2601.00024v1 Announce Type: new
Abstract: The 13-card variant of Classic Indian Rummy is a sequential game of incomplete information that requires probabilistic reasoning and combinatorial decision-making. This paper proposes a rule-based framework for strategic play, driven by a new hand-eva...
New research shows that AI doesn’t need endless training data to start acting more like a human brain. When researchers redesigned AI systems to better resemble biological brains, some models produced brain-like activity without any training at all. This challenges today’s data-hungry approach to AI...
The Drill-Down and Fabricate Test (DDFT): A Protocol for Measuring Epistemic Robustness in Language Models
arXiv:2512.23850v1 Announce Type: new
Abstract: Current language model evaluations measure what models know under ideal conditions but not how robustly they know it under realistic stress. Static benchmarks like MMLU and TruthfulQA cannot distinguish a model that lacks knowledge from one whose veri...
CASCADE: Cumulative Agentic Skill Creation through Autonomous Development and Evolution
arXiv:2512.23880v1 Announce Type: new
Abstract: Large language model (LLM) agents currently depend on predefined tools or brittle tool generation, constraining their capability and adaptability to complex scientific tasks. We introduce CASCADE, a self-evolving agentic framework representing an earl...
A Proof-of-Concept for Explainable Disease Diagnosis Using Large Language Models and Answer Set Programming
arXiv:2512.23932v1 Announce Type: new
Abstract: Accurate disease prediction is vital for timely intervention, effective treatment, and reducing medical complications. While symbolic AI has been applied in healthcare, its adoption remains limited due to the effort required for constructing high-qual...
SPARK: Search Personalization via Agent-Driven Retrieval and Knowledge-sharing
arXiv:2512.24008v1 Announce Type: new
Abstract: Personalized search demands the ability to model users' evolving, multi-dimensional information needs; a challenge for systems constrained by static profiles or monolithic retrieval pipelines. We present SPARK (Search Personalization via Agent-Driven ...
ROAD: Reflective Optimization via Automated Debugging for Zero-Shot Agent Alignment
arXiv:2512.24040v1 Announce Type: new
Abstract: Automatic Prompt Optimization (APO) has emerged as a critical technique for enhancing Large Language Model (LLM) performance, yet current state-of-the-art methods typically rely on large, labeled gold-standard development sets to compute fitness score...
Coordinate Matrix Machine: A Human-level Concept Learning to Classify Very Similar Documents
arXiv:2512.23749v1 Announce Type: new
Abstract: Human-level concept learning argues that humans typically learn new concepts from a single example, whereas machine learning algorithms typically require hundreds of samples to learn a single concept. Our brain subconsciously identifies important feat...
arXiv:2512.23752v1 Announce Type: new
Abstract: Recent work has shown that small transformers trained in controlled "wind-tunnel'' settings can implement exact Bayesian inference, and that their training dynamics produce a geometric substrate -- low-dimensional value manifolds and progressively ort...
Pruning Graphs by Adversarial Robustness Evaluation to Strengthen GNN Defenses
arXiv:2512.22128v1 Announce Type: new
Abstract: Graph Neural Networks (GNNs) have emerged as a dominant paradigm for learning on graph-structured data, thanks to their ability to jointly exploit node features and relational information encoded in the graph topology. This joint modeling, however, al...
Towards Unsupervised Causal Representation Learning via Latent Additive Noise Model Causal Autoencoders
arXiv:2512.22150v1 Announce Type: new
Abstract: Unsupervised representation learning seeks to recover latent generative factors, yet standard methods relying on statistical independence often fail to capture causal dependencies. A central challenge is identifiability: as established in disentangled...
SoliReward: Mitigating Susceptibility to Reward Hacking and Annotation Noise in Video Generation Reward Models
arXiv:2512.22170v1 Announce Type: new
Abstract: Post-training alignment of video generation models with human preferences is a critical goal. Developing effective Reward Models (RMs) for this process faces significant methodological hurdles. Current data collection paradigms, reliant on in-prompt p...
Wireless Traffic Prediction with Large Language Model
arXiv:2512.22178v1 Announce Type: new
Abstract: The growing demand for intelligent, adaptive resource management in next-generation wireless networks has underscored the importance of accurate and scalable wireless traffic prediction. While recent advancements in deep learning and foundation models...
Latent Sculpting for Zero-Shot Generalization: A Manifold Learning Approach to Out-of-Distribution Anomaly Detection
arXiv:2512.22179v1 Announce Type: new
Abstract: A fundamental limitation of supervised deep learning in high-dimensional tabular domains is "Generalization Collapse": models learn precise decision boundaries for known distributions but fail catastrophically when facing Out-of-Distribution (OOD) dat...
Bidirectional RAG: Safe Self-Improving Retrieval-Augmented Generation Through Multi-Stage Validation
arXiv:2512.22199v1 Announce Type: new
Abstract: Retrieval-Augmented Generation RAG systems enhance large language models by grounding responses in external knowledge bases, but conventional RAG architectures operate with static corpora that cannot evolve from user interactions. We introduce Bidirec...
Emergent Persuasion: Will LLMs Persuade Without Being Prompted?
arXiv:2512.22201v1 Announce Type: new
Abstract: With the wide-scale adoption of conversational AI systems, AI are now able to exert unprecedented influence on human opinion and beliefs. Recent work has shown that many Large Language Models (LLMs) comply with requests to persuade users into harmful ...