CreditAudit: 2$^\text{nd}$ Dimension for LLM Evaluation and Selection
arXiv:2602.02515v2 Announce Type: new
Abstract: Leaderboard scores on public benchmarks have been steadily rising and converging, with many frontier language models now separated by only marginal differences. However, these scores often fail to match users' day to day experience, because system pro...
OGD4All: A Framework for Accessible Interaction with Geospatial Open Government Data Based on Large Language Models
arXiv:2602.00012v1 Announce Type: new
Abstract: We present OGD4All, a transparent, auditable, and reproducible framework based on Large Language Models (LLMs) to enhance citizens' interaction with geospatial Open Government Data (OGD). The system combines semantic data retrieval, agentic reasoning ...
RAPTOR-AI for Disaster OODA Loop: Hierarchical Multimodal RAG with Experience-Driven Agentic Decision-Making
arXiv:2602.00030v1 Announce Type: new
Abstract: Effective humanitarian assistance and disaster relief (HADR) requires rapid situational understanding, reliable decision support, and the ability to generalize across diverse and previously unseen disaster contexts. This work introduces an agentic Ret...
Measurement for Opaque Systems: Multi-source Triangulation with Interpretable Machine Learning
arXiv:2602.00022v1 Announce Type: new
Abstract: We propose a measurement framework for difficult-to-access contexts that uses indirect data traces, interpretable machine-learning models, and theory-guided triangulation to fill inaccessible measurement spaces. Many high-stakes systems of scientific ...
ELLMPEG: An Edge-based Agentic LLM Video Processing Tool
arXiv:2602.00028v1 Announce Type: new
Abstract: Large language models (LLMs), the foundation of generative AI systems like ChatGPT, are transforming many fields and applications, including multimedia, enabling more advanced content generation, analysis, and interaction. However, cloud-based LLM dep...
Learning to Price: Interpretable Attribute-Level Models for Dynamic Markets
arXiv:2602.00188v1 Announce Type: new
Abstract: Dynamic pricing in high-dimensional markets poses fundamental challenges of scalability, uncertainty, and interpretability. Existing low-rank bandit formulations learn efficiently but rely on latent features that obscure how individual product attribu...
Scalable and Secure AI Inference in Healthcare: A Comparative Benchmarking of FastAPI and Triton Inference Server on Kubernetes
arXiv:2602.00053v1 Announce Type: new
Abstract: Efficient and scalable deployment of machine learning (ML) models is a prerequisite for modern production environments, particularly within regulated domains such as healthcare and pharmaceuticals. In these settings, systems must balance competing req...
Localizing and Correcting Errors for LLM-based Planners
arXiv:2602.00276v1 Announce Type: new
Abstract: Large language models (LLMs) have demonstrated strong reasoning capabilities on math and coding, but frequently fail on symbolic classical planning tasks. Our studies, as well as prior work, show that LLM-generated plans routinely violate domain const...
From Gameplay Traces to Game Mechanics: Causal Induction with Large Language Models
arXiv:2602.00190v1 Announce Type: new
Abstract: Deep learning agents can achieve high performance in complex game domains without often understanding the underlying causal game mechanics. To address this, we investigate Causal Induction: the ability to infer governing laws from observational data, ...
Attention Isn't All You Need for Emotion Recognition:Domain Features Outperform Transformers on the EAV Dataset
arXiv:2601.22161v1 Announce Type: new
Abstract: We present a systematic study of multimodal emotion recognition using the EAV dataset, investigating whether complex attention mechanisms improve performance on small datasets. We implement three model categories: baseline transformers (M1), novel fac...
FedAdaVR: Adaptive Variance Reduction for Robust Federated Learning under Limited Client Participation
arXiv:2601.22204v1 Announce Type: new
Abstract: Federated learning (FL) encounters substantial challenges due to heterogeneity, leading to gradient noise, client drift, and partial client participation errors, the last of which is the most pervasive but remains insufficiently addressed in current l...
Neural Signals Generate Clinical Notes in the Wild
arXiv:2601.22197v1 Announce Type: new
Abstract: Generating clinical reports that summarize abnormal patterns, diagnostic findings, and clinical interpretations from long-term EEG recordings remains labor-intensive. We curate a large-scale clinical EEG dataset with $9{,}922$ reports paired with appr...
Multitask Learning for Earth Observation Data Classification with Hybrid Quantum Network
arXiv:2601.22195v1 Announce Type: new
Abstract: Quantum machine learning (QML) has gained increasing attention as a potential solution to address the challenges of computation requirements in the future. Earth observation (EO) has entered the era of Big Data, and the computational demands for effec...
arXiv:2601.22269v1 Announce Type: new
Abstract: Judge agents are fundamental to agentic AI frameworks: they provide automated evaluation, and enable iterative self-refinement of reasoning processes. We introduce JAF: Judge Agent Forest, a framework in which the judge agent conducts joint inference ...
Sparks of Rationality: Do Reasoning LLMs Align with Human Judgment and Choice?
arXiv:2601.22329v1 Announce Type: new
Abstract: Large Language Models (LLMs) are increasingly positioned as decision engines for hiring, healthcare, and economic judgment, yet real-world human judgment reflects a balance between rational deliberation and emotion-driven bias. If LLMs are to particip...
The Six Sigma Agent: Achieving Enterprise-Grade Reliability in LLM Systems Through Consensus-Driven Decomposed Execution
arXiv:2601.22290v1 Announce Type: new
Abstract: Large Language Models demonstrate remarkable capabilities yet remain fundamentally probabilistic, presenting critical reliability challenges for enterprise deployment. We introduce the Six Sigma Agent, a novel architecture that achieves enterprise-gra...
Learning Provably Correct Distributed Protocols Without Human Knowledge
arXiv:2601.22369v1 Announce Type: new
Abstract: Provably correct distributed protocols, which are a critical component of modern distributed systems, are highly challenging to design and have often required decades of human effort. These protocols allow multiple agents to coordinate to come to a co...
A generative machine learning model for designing metal hydrides applied to hydrogen storage
arXiv:2601.20892v1 Announce Type: new
Abstract: Developing new metal hydrides is a critical step toward efficient hydrogen storage in carbon-neutral energy systems. However, existing materials databases, such as the Materials Project, contain a limited number of well-characterized hydrides, which c...
arXiv:2601.20884v1 Announce Type: new
Abstract: Multimodal pretraining is effective for building general-purpose representations, but in many practical deployments, only one modality is heavily used during downstream fine-tuning. Standard pretraining strategies treat all modalities uniformly, which...
Rethinking LLM-Driven Heuristic Design: Generating Efficient and Specialized Solvers via Dynamics-Aware Optimization
arXiv:2601.20868v1 Announce Type: new
Abstract: Large Language Models (LLMs) have advanced the field of Combinatorial Optimization through automated heuristic generation. Instead of relying on manual design, this LLM-Driven Heuristic Design (LHD) process leverages LLMs to iteratively generate and r...
Bayesian-LoRA: Probabilistic Low-Rank Adaptation of Large Language Models
arXiv:2601.21003v1 Announce Type: new
Abstract: Large Language Models usually put more emphasis on accuracy and therefore, will guess even when not certain about the prediction, which is especially severe when fine-tuned on small datasets due to the inherent tendency toward miscalibration. In this ...
Latent Object Permanence: Topological Phase Transitions, Free-Energy Principles, and Renormalization Group Flows in Deep Transformer Manifolds
arXiv:2601.19942v1 Announce Type: new
Abstract: We study the emergence of multi-step reasoning in deep Transformer language models through a geometric and statistical-physics lens. Treating the hidden-state trajectory as a flow on an implicit Riemannian manifold, we analyze the layerwise covariance...
oculomix: Hierarchical Sampling for Retinal-Based Systemic Disease Prediction
arXiv:2601.19939v1 Announce Type: new
Abstract: Oculomics - the concept of predicting systemic diseases, such as cardiovascular disease and dementia, through retinal imaging - has advanced rapidly due to the data efficiency of transformer-based foundation models like RETFound. Image-level mixed sam...
Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data
arXiv:2601.19936v1 Announce Type: new
Abstract: The opacity of massive pretraining corpora in Large Language Models (LLMs) raises significant privacy and copyright concerns, making pretraining data detection a critical challenge. Existing state-of-the-art methods typically rely on token likelihoods...