To Throw a Stone with Six Birds: On Agents and Agenthood
arXiv:2604.03239v1 Announce Type: new
Abstract: Six Birds Theory (SBT) treats macroscopic objects as induced closures rather than primitives. Empirical discussions of agency often conflate persistence (being an object) with control (making a counterfactual difference), which makes agency claims dif...
Position: Science of AI Evaluation Requires Item-level Benchmark Data
arXiv:2604.03244v1 Announce Type: new
Abstract: AI evaluations have become the primary evidence for deploying generative AI systems across high-stakes domains. However, current evaluation paradigms often exhibit systemic validity failures. These issues, ranging from unjustified design choices to mi...
Toward Full Autonomous Laboratory Instrumentation Control with Large Language Models
arXiv:2604.03286v1 Announce Type: new
Abstract: The control of complex laboratory instrumentation often requires significant programming expertise, creating a barrier for researchers lacking computational skills. This work explores the potential of large language models (LLMs), such as ChatGPT, and...
Integrating Artificial Intelligence, Physics, and Internet of Things: A Framework for Cultural Heritage Conservation
arXiv:2604.03233v1 Announce Type: new
Abstract: The conservation of cultural heritage increasingly relies on integrating technological innovation with domain expertise to ensure effective monitoring and predictive maintenance. This paper presents a novel framework to support the preservation of cul...
arXiv:2604.03240v1 Announce Type: new
Abstract: Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by grounding generation in external knowledge, yielding relevance responses that are aligned with factual evidence and evolving corpora. Standard RAG pipelines construct contex...
DRAFT: Task Decoupled Latent Reasoning for Agent Safety
arXiv:2604.03242v1 Announce Type: new
Abstract: The advent of tool-using LLM agents shifts safety monitoring from output moderation to auditing long, noisy interaction trajectories, where risk-critical evidence is sparse-making standard binary supervision poorly suited for credit assignment. To add...
arXiv:2604.03335v1 Announce Type: new
Abstract: Apparent age estimation is a valuable tool for business personalization, yet current models frequently exhibit demographic biases. We review prior works on the DEX method by applying distribution learning techniques such as Mean-Variance Loss (MVL) an...
Convolutional Surrogate for 3D Discrete Fracture-Matrix Tensor Upscaling
arXiv:2604.02335v1 Announce Type: new
Abstract: Modeling groundwater flow in three-dimensional fractured crystalline media requires accounting for strong spatial heterogeneity induced by fractures. Fine-scale discrete fracture-matrix (DFM) simulations can capture this complexity but are computation...
Generating Counterfactual Patient Timelines from Real-World Data
arXiv:2604.02337v1 Announce Type: new
Abstract: Counterfactual simulation - exploring hypothetical consequences under alternative clinical scenarios - holds promise for transformative applications such as personalized medicine and in silico trials. However, it remains challenging due to methodologi...
LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning
arXiv:2604.02338v1 Announce Type: new
Abstract: MoE-PEFT methods combine Mixture of Experts with parameter-efficient fine-tuning for multi-task adaptation, but require separate adapters per expert causing trainable parameters to scale linearly with expert count and limiting applicability to adapter...
SIEVE: Sample-Efficient Parametric Learning from Natural Language
arXiv:2604.02339v1 Announce Type: new
Abstract: Natural language context-such as instructions, knowledge, or feedback-contains rich signal for adapting language models. While in-context learning provides adaptation via the prompt, parametric learning persists into model weights and can improve perf...
Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models
arXiv:2604.02340v1 Announce Type: new
Abstract: Recent advances in masked diffusion language models (MDLMs) narrow the quality gap to autoregressive LMs, but their sampling remains expensive because generation requires many full-sequence denoising passes with a large Transformer and, unlike autoreg...
Holos: A Web-Scale LLM-Based Multi-Agent System for the Agentic Web
arXiv:2604.02334v1 Announce Type: new
Abstract: As large language models (LLM)-driven agents transition from isolated task solvers to persistent digital entities, the emergence of the Agentic Web, an ecosystem where heterogeneous agents autonomously interact and co-evolve, marks a pivotal shift tow...
Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation
arXiv:2604.02368v1 Announce Type: new
Abstract: As Large Language Models (LLMs) exhibit plateauing performance on conventional benchmarks, a pivotal challenge persists: evaluating their proficiency in complex, open-ended tasks characterizing genuine expert-level cognition. Existing frameworks suffe...
Understanding the Nature of Generative AI as Threshold Logic in High-Dimensional Space
arXiv:2604.02476v1 Announce Type: new
Abstract: This paper examines the role of threshold logic in understanding generative artificial intelligence. Threshold functions, originally studied in the 1960s in digital circuit synthesis, provide a structurally transparent model of neural computation: a w...
AIVV: Neuro-Symbolic LLM Agent-Integrated Verification and Validation for Trustworthy Autonomous Systems
arXiv:2604.02478v1 Announce Type: new
Abstract: Deep learning models excel at detecting anomaly patterns in normal data. However, they do not provide a direct solution for anomaly classification and scalability across diverse control systems, frequently failing to distinguish genuine faults from nu...
SQUIRE: Interactive UI Authoring via Slot QUery Intermediate REpresentations
Frontend developers create UI prototypes to evaluate alternatives, which is a time-consuming process of repeated iteration and refinement. Generative AI code assistants enable rapid prototyping simply by prompting through a chat interface rather than writing code. However, while this interaction giv...
An Online Machine Learning Multi-resolution Optimization Framework for Energy System Design Limit of Performance Analysis
arXiv:2604.01308v1 Announce Type: new
Abstract: Designing reliable integrated energy systems for industrial processes requires optimization and verification models across multiple fidelities, from architecture-level sizing to high-fidelity dynamic operation. However, model mismatch across fidelitie...
Two-Stage Optimizer-Aware Online Data Selection for Large Language Models
arXiv:2604.00001v1 Announce Type: new
Abstract: Gradient-based data selection offers a principled framework for estimating sample utility in large language model (LLM) fine-tuning, but existing methods are mostly designed for offline settings. They are therefore less suited to online fine-tuning, w...
Task-Centric Personalized Federated Fine-Tuning of Language Models
arXiv:2604.00050v1 Announce Type: new
Abstract: Federated Learning (FL) has emerged as a promising technique for training language models on distributed and private datasets of diverse tasks. However, aggregating models trained on heterogeneous tasks often degrades the overall performance of indivi...
Temporal Memory for Resource-Constrained Agents: Continual Learning via Stochastic Compress-Add-Smooth
arXiv:2604.00067v1 Announce Type: new
Abstract: An agent that operates sequentially must incorporate new experience without forgetting old experience, under a fixed memory budget. We propose a framework in which memory is not a parameter vector but a stochastic process: a Bridge Diffusion on a repl...
One Panel Does Not Fit All: Case-Adaptive Multi-Agent Deliberation for Clinical Prediction
arXiv:2604.00085v1 Announce Type: new
Abstract: Large language models applied to clinical prediction exhibit case-level heterogeneity: simple cases yield consistent outputs, while complex cases produce divergent predictions under minor prompt changes. Existing single-agent strategies sample from on...
Open, Reliable, and Collective: A Community-Driven Framework for Tool-Using AI Agents
arXiv:2604.00137v1 Announce Type: new
Abstract: Tool-integrated LLMs can retrieve, compute, and take real-world actions via external tools, but reliability remains a key bottleneck. We argue that failures stem from both tool-use accuracy (how well an agent invokes a tool) and intrinsic tool accurac...