Demystifying AI agents: going beyond the buzzwords
"Agent" is the most overused word in AI right now. But strip away the hype and what are you actually working with? Adobe principal scientist Deepak Pai breaks down the real building blocks of agentic systems and when they're worth reaching for.
Scientists are seriously asking if bees and ChatGPT are conscious
New studies suggest consciousness can't be judged solely by behavior, whether it's a chatbot discussing philosophy or a bee searching for nectar. Researchers are increasingly focusing on the internal mechanisms of brains and computers, concluding that today's AI is likely not conscious while leaving...
I Know What You Meme, Even If it Emerged Today: Understanding Evolving Memes through Open-World Knowledge Acquisition
arXiv:2606.05316v1 Announce Type: new
Abstract: Multimodal memes are dynamic and often require up to date background knowledge for interpretation. Existing methods often overlook such knowledge or rely on fixed parametric knowledge of pretrained models that may be incomplete, outdated, or unavailab...
What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems
arXiv:2606.05304v1 Announce Type: new
Abstract: Multi-agent systems (MAS) built on large language models are typically organized around roles, pipelines, and turn schedules, while the content that agents pass to one another is often left as unconstrained natural language. However, this free-form co...
How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field Experiment
arXiv:2606.05256v1 Announce Type: new
Abstract: This study analyzes a publicly released dataset from a discontinued field experiment on Reddit's r/ChangeMyView. The intervention, conducted by unknown, external researchers and halted following ethical backlash, involved undisclosed AI-generated acco...
Uncertainty Aware Functional Behavior Prediction and Material Fatigue Assessment for Circular Factory
arXiv:2606.05334v1 Announce Type: new
Abstract: Returned products in circular factories re-enter production with heterogeneous degradation states, usage histories, and remaining capability. Reuse cannot be decided from the current inspection alone, because future function fulfillment and component ...
GITCO: Gated Inference-Time Context Optimization in TSFMs
arXiv:2606.05332v1 Announce Type: new
Abstract: Patch-based Time Series Foundation Models (TSFMs) suffer from context poisoning: structurally anomalous patches capture disproportionate attention and silently degrade zero-shot forecast quality. We propose improving TSFM accuracy at inference time by...
Temporal Preference Concepts and their Functions in a Large Language Model
arXiv:2606.05194v1 Announce Type: new
Abstract: Large Language Models (LLMs) are increasingly being deployed to make decisions that require trading off near-term gains against long-term consequences, yet little is known about how they internally represent or resolve these tradeoffs. In this work, w...
Staged Factorial Screening for Budget-Constrained Micro-Pretraining
arXiv:2606.05186v1 Announce Type: new
Abstract: Budget-constrained micro-pretraining often requires triaging many candidate recipes on a shared accelerator before larger search budgets are spent. We study whether a staged fractional-factorial workflow can recover stable early effect structure in th...
ERRORQUAKE: Heavy-Tailed Error Severity Distributions in Open-Weight Large Language Models
arXiv:2606.05170v1 Announce Type: new
Abstract: At matched accuracy, open-weight LLMs differ substantially in the shape of their error severity distribution -- a difference invisible to the scalar error rate. Hallucination benchmarks report a single error count and treat all errors as equivalent, y...
The Evaluation Blind Spot: A Stereological Theory of Benchmark Coverage for Large Language Models
arXiv:2606.05169v1 Announce Type: new
Abstract: We give a stereological theory of LLM benchmark coverage. For any suite with effective dimensionality d_eff, the visible Hausdorff distance between two convex capability profiles consistent with the same scores is bounded by epsilon + C R m^(-1/(d_eff...
Consensus is Strategically Insufficient: Reasoning-Trace Disagreement as a Knowledge-Representation Signal
arXiv:2606.04223v1 Announce Type: new
Abstract: Multi-agent systems are commonly designed to reduce disagreement through voting, consensus protocols, debate, or fault-tolerant aggregation. We argue that this objective is insufficient for value-laden tasks, where disagreement may reflect genuine nor...
SMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language Models
arXiv:2606.04202v1 Announce Type: new
Abstract: As LLMs become more widely deployed, they are increasingly expected to work alongside other AI agents rather than operating in isolation. Effective coordination in these settings requires agents to communicate, share information and make decisions und...
Thinking Through Signs: PEEL as a Semiotic Scaffolding for Epistemically Accountable AI-Enabled Research
arXiv:2606.04152v1 Announce Type: new
Abstract: Large language models are reshaping research practice while quietly eroding researchers epistemic accountability. This commentary introduces PEEL - Protocols for Epistemically Engaged Literacy in AI, a working scaffolding that combines deterministic d...
Stumbling Into AI Emotional Dependence: How Routine AI Interactions Reshape Human Connection
arXiv:2606.04150v1 Announce Type: new
Abstract: Public discourse and emerging policy typically assume that AI emotional support is a deliberate act: a lonely user consciously seeking comfort from a dedicated companion chatbot. In this paper, we draw on emerging empirical evidence and argue that thi...
Toward Pre-Deployment Assurance for Enterprise AI Agents: Ontology-Grounded Simulation and Trust Certification
arXiv:2606.04037v1 Announce Type: new
Abstract: Pre-deployment verification of enterprise artificial intelligence (AI) agents remains a critical gap between large language model (LLM) capability benchmarking and production deployment. Post-deployment monitoring, human-in-the-loop controls, and prom...
Position: Deployed Reinforcement Learning should be Continual
arXiv:2606.04029v1 Announce Type: new
Abstract: Reinforcement Learning (RL) has received increasing attention and adoption in real-world use cases. Most of these systems follow a train-then-fix paradigm, where trained agents do not learn while interacting with the world until performance degrades a...
Human-in-the-Loop Contextual Bandits for Short-Term Rental Dynamic Pricing: Structural Equivalence of Historical Warm-Up and Approval-Gated Live Learning
arXiv:2606.02595v1 Announce Type: new
Abstract: Dynamic pricing in short-term rental (STR) markets presents a distinctive challenge for online learning algorithms: pricing decisions carry significant financial risk, operators require explainability, and market feedback is sparse (one booking outcom...
Visual Graph Scaffolds for Structural Reasoning in Large Language Models
arXiv:2606.02673v1 Announce Type: new
Abstract: Graphs have been used to enhance large language models (LLMs) for structured reasoning, mostly as external knowledge sources are provided to models at test time. In this paper, we take a different view: the value of graphs for LLMs lie not only in sup...
AURA: Action-Gated Memory for Robot Policies at Constant VRAM
arXiv:2606.02775v1 Announce Type: new
Abstract: The KV-cache is the right memory for datacenters but the wrong memory for robots. Datacenter inference batches many short requests and resets them, amortizing an attention cache across a crowd. Embodied agents instead run one long, non-resetting episo...
BehaviorBench: Modeling Real-World User Decisions from Behavioral Traces
arXiv:2606.02798v1 Announce Type: new
Abstract: Many decision-support settings require systems that adapt to individual users, but evaluation data for this problem remain limited. Existing benchmarks for user understanding often rely on simulated users or model-generated behavior, even though recen...
ChatHealthAI: Aligning Electronic Health Record Representations with Large Language Models for Grounded Clinical Reasoning
arXiv:2606.02802v1 Announce Type: new
Abstract: Large language models (LLMs) exhibit strong natural-language reasoning abilities for clinical decision support, but struggle to effectively model structured longitudinal electronic health records (EHRs). In contrast, EHR foundation models can learn pr...
Evaluating Transformer and LSTM Frameworks for Prediction in Ungauged Basins
arXiv:2606.02791v1 Announce Type: new
Abstract: Watershed networks exhibit convergent topologies in which multiple tributaries merge into downstream channels,integrating diverse upstream hydrological processes. In ungauged basins, the absence of direct observations increases uncertainty and limits ...