Reading Today’s Headlines Through AI: A Real-Time Audit of Six Commercial Chatbots
In a new study, scholars measured how accurately popular AI chatbots answered questions about the emerging news and found substantial regional disparity, dependence on distinct information ecosystems, and acute fragility under imperfect prompts.
Most AI agent failures don't happen during the demo. They happen when APIs fail, context windows explode, costs spiral, and nobody can explain why the agent made a decision. Here are five questions that separate production-ready platforms from expensive experiments.
6 things to fix before RLHF turns your biases into features
Your reward model is learning exactly what your annotators prefer. The problem is that "better" and "unbiased" are two different things, and RLHF has no way to tell them apart.
Hoeffding Concept Bottleneck Models with Applications to Overhead Images
arXiv:2606.00082v1 Announce Type: new
Abstract: Explainability of deep learning algorithms is critical for computer-vision applications with high-stake decisions. Concept bottleneck models (CBM) have recently shown promising performance to provide explainable and accurate predictions for classifica...
DAStatFormer: A Hybrid Multibranch Transformer with Statistical Feature Integration for DAS-Based Pattern Recognitions
arXiv:2606.00081v1 Announce Type: new
Abstract: Distributed Acoustic Sensing (DAS) enables large-scale monitoring through optical fibers, but its high dimensionality and complex spatio-temporal patterns make event classification demanding. Existing deep learning approaches-CNNs, recurrent models, a...
From Demonstrations to Rewards: Test-Time Prompt Optimization for VLM Reward Models
arXiv:2606.00083v1 Announce Type: new
Abstract: Reinforcement learning relies on accurate reward functions, which are often hand-crafted or even unavailable in real-world applications, such as robotics. Recent work has explored the zero-shot reasoning capabilities of pre-trained Vision-Language Mod...
A Shared Valence Axis Across Modern LLMs and Human EEG: The Saturation Regularity
arXiv:2606.00129v1 Announce Type: new
Abstract: Large language models (LLMs) have emerged as powerful representation learners whose internal features increasingly align with human cognition. We study whether modern LLMs can serve as a lens for understanding neural representations in the human brain...
Emergent Collaborative Deliberation in Multi-Model AI Systems: A BFT-Derived Protocol for Epistemic Synthesis
arXiv:2606.00005v1 Announce Type: new
Abstract: We present the Consilium Protocol, a Byzantine Fault Tolerance-derived architecture for structured multi-model AI deliberation that treats inter-model disagreement as epistemic signal rather than error. The protocol assigns engineered cognitive person...
Deliberative Curation: A Protocol for Multi-Agent Knowledge Bases
arXiv:2606.00007v1 Announce Type: new
Abstract: As AI agents transition from isolated tools to collaborative participants in shared knowledge ecosystems, governing collective knowledge curation becomes a critical challenge. Human platform governance mechanisms do not transfer directly: agent statel...
Agents on a Tree: Pathwise Coordination for Multi-Objective Molecular Optimization
arXiv:2606.00008v1 Announce Type: new
Abstract: Multi-objective molecular optimization requires searching vast chemical spaces under conflicting objectives, where early design decisions strongly constrain downstream outcomes. Existing methods typically rely on a single policy or fixed scalarization...
Uncertainty-Aware and Temporally Regulated Expert Advice in Reinforcement Learning for Autonomous Driving
arXiv:2605.30576v1 Announce Type: new
Abstract: Exploration in reinforcement learning for autonomous driving is inherently unsafe: agents must experience novel behaviors to learn, yet exploration can lead to collisions or off-road driving. We propose an uncertainty-aware framework that leverages ex...
Physically Viable World Models: A Case for Query-Conditioned Embodied AI
arXiv:2605.30542v1 Announce Type: new
Abstract: World models for embodied AI must be physically viable: constructed to answer intervention queries by representing the physical structure governing action outcomes, rather than merely predicting future observations. Existing observation-predictive wor...
PhyDrawGen: Physically Grounded Diagram Generation from Natural Language
arXiv:2605.30512v1 Announce Type: new
Abstract: Generating physics diagrams from text requires strict adherence to physical laws. While current generative models produce visually plausible outputs, they systematically hallucinate force vectors, ignore conservation laws, and violate geometric constr...
Unicorn: Scaling High-Dimensional Time Series Forecasting via Universal Correlation Modeling
arXiv:2605.30376v1 Announce Type: new
Abstract: Modern time series architectures face a fundamental trade-off: channel-independent models scale well with increasing data volume but ignore critical inter-channel dependencies, while channel-dependent models are expressive but remain ``dimension-bound...
Gait2Hip-60: A Unified Deep Learning Benchmark for Predicting Hip Muscle Forces and Joint Moments from Multi-Cadence Gait Kinematics
arXiv:2605.30374v1 Announce Type: new
Abstract: Estimating hip muscle forces and joint moments during gait typically relies on musculoskeletal simulation, which is informative but time-consuming and difficult to apply in clinical settings. This study developed a deep learning framework to predict t...
QASM-Eval: A Dataset to Train and Evaluate LLMs on OpenQASM-3 Beyond Quantum Circuits
arXiv:2605.30358v1 Announce Type: new
Abstract: Quantum computing remains in the Noisy Intermediate-Scale Quantum (NISQ) era, where the performance is highly constrained to noise. Addressing the limitation often requires hardware-facing capabilities beyond gate-sequence circuit specification, inclu...
Molecular Lead Optimization via Agentic Tool Planning
arXiv:2605.28862v1 Announce Type: new
Abstract: Drug discovery is a lengthy and resource-intensive process composed of multiple stages. Among these stages, lead optimization plays a critical role in transforming early hit compounds into viable drug candidates. This stage requires improving ADMET-re...
Ultra-Reduced-Impact-Encased-Logging (URIEL): propose a new method for selective sustainable logging and post-harvest silvicultural treatment in tropical forest using airborne robotics systems
arXiv:2605.28883v1 Announce Type: new
Abstract: Tropical forests worldwide are under intense deforestation pressure driven by economic and political interests, and scientific evidence suggests this deforestation contributes to climate change. This paper proposes a novel logging method for tropical ...
The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling
arXiv:2605.28864v1 Announce Type: new
Abstract: The Cognitive Categorical Transformer (CCT) is a 306M-parameter architecture that augments a pretrained GPT-2 Small backbone with cognitively grounded components derived from category theory and several inspirations from cognitive science. Under a mat...
Self-Play Reinforcement Learning under Imperfect Information in Big 2
arXiv:2605.28863v1 Announce Type: new
Abstract: Imperfect-information multiplayer games test whether agents can act under hidden information, sparse rewards, and non-stationary opponents. We study these challenges in Big 2, a four-player imperfect-information card game. We develop a self-play RL fr...
Representation Signatures and Risk-Feedback Alignment in LLM Trading Agents
arXiv:2605.28850v1 Announce Type: new
Abstract: We study behavioral alignment and representation dynamics of large language model (LLM) agents in financial decision environments. Using TradeArena, an auditable trading-agent testbed with risk reports, execution simulation, memory, and replayable tra...
Data Formulator 0.7: AI-powered data analytics for enterprise data
Data Formulator introduces AI-powered analytics for enterprise data workflows. Data teams can easily bring enterprise data into an AI-ready workspace where users can explore, analyze, and visualize data with AI agents to turn raw data into actionable insights.
The post Data Formulator 0.7: AI-powere...