DoorDash’s new AI chatbot lets you order with prompts and photos
The new chatbot, called Ask DoorDash, allows users to search the app for what they're looking for in their own words instead of having to scroll through restaurants and stores to build a cart.
DiffusionGemma: Google’s Diffusion-Based Open Model for Faster Text Generation
Large language models usually generate text one token at a time. While this autoregressive approach delivers strong quality and instruction following, it can be inefficient for local users because GPUs often spend more time moving weights from memory than doing parallel compute. Google DeepMind’s Di...
Text classification typically boils down to scenarios where a product review is "positive" or "negative", or a customer inquiry belongs to one category or another.
The benchmark gap, explained: What AI leaderboards measure and what they miss
Every frontier model now scores above 88% on MMLU. So why does a 37% gap still exist between lab benchmark scores and real-world AI deployment performance? We explain why the tests keep lying, and what rigorous evaluation actually looks like.
Google DeepMind is worried about what happens when millions of agents start to interact
Google DeepMind is funding research into the potential dangers of millions of different AI agents interacting with each other online. According to Rohin Shah, who directs the company’s AGI safety and alignment research, the mass-market arrival of agents that can carry out tasks without human oversig...
When Context Collapses: Teaching Agents to Detect and Recover from Lost Memory
This is the eighth article in a series on agentic engineering and AI-driven development. Read part one here, part two here, part three here, part four here, part five here, part six here, and part seven here. “640K ought to be enough for anybody.”—Bill Gates (allegedly) If you’re building AI agents ...
Nous Research Ships Hermes Agent Profile Builder: Identity, Model, Skills, and MCP Servers in One Dashboard Flow
The Hermes Agent dashboard now builds complete agent profiles in one flow, replacing multi-step CLI setup for users.
The post Nous Research Ships Hermes Agent Profile Builder: Identity, Model, Skills, and MCP Servers in One Dashboard Flow appeared first on MarkTechPost.
Meet ‘North Mini Code’: Cohere’s 30B Open-Weight Mixture-of-Experts Model With 3B Active Parameters for Agentic Coding
Cohere's first developer coding model is a 30B mixture-of-experts running on a single H100 with 256K context length.
The post Meet ‘North Mini Code’: Cohere’s 30B Open-Weight Mixture-of-Experts Model With 3B Active Parameters for Agentic Coding appeared first on MarkTechPost.
To Intervene or Not: Guiding Inference-time Alignment with Probabilistic Model Blending
arXiv:2606.11201v1 Announce Type: new
Abstract: The wide deployment of LLMs has made model alignment necessary to make newly trained models safely and effectively respond to user instructions. Among different methods, inference-time alignment is often cheaper as it intervenes (i.e., offers guidance...
Restless bandits with imperfect binary feedback: PCL-indexability analysis and computation
arXiv:2606.11192v1 Announce Type: new
Abstract: We study restless bandits with binary latent states and imperfect binary feedback, motivated by opportunistic spectrum access with sensing errors. For the associated belief-state model, we develop a partial conservation laws (PCL)-based analytical and...
Dual-Stance Evaluation of Sycophancy: The Structure of Agreement and the Limits of Intervention
arXiv:2606.11205v1 Announce Type: new
Abstract: Activation steering can shift LLM behaviour, but standard evaluations do not typically test whether a sycophancy-reduction direction also suppresses agreement with factually correct statements. We introduce dual-stance evaluation, which tests both sta...
ProHiFlo: Hierarchical Flow Matching with Functional Guidance for De Novo Protein Generation
arXiv:2606.11243v1 Announce Type: new
Abstract: De novo protein generation has transformative potential in therapeutic design, enzyme engineering, and synthetic biology. While diffusion-based and flow matching approaches have achieved progress, they typically operate at single resolution and lack m...
From Explicit Elements to Implicit Intent: A Predefined Library for Auditable Behavioral Inference
arXiv:2606.11207v1 Announce Type: new
Abstract: We present SemantiClean, a modular framework for extracting structured semantic signals from e-commerce session data and driving pluggable inference targets including purchase intent, customer segmentation, and product affinity through a shared elemen...
Position: Hippocampal Explicit Memory Is the Cornerstone for AGI
arXiv:2606.11245v1 Announce Type: new
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, raising expectations for Artificial General Intelligence (AGI). This position paper argues that integrating explicit memory is the cornerstone for advancing L...
arXiv:2606.11337v1 Announce Type: new
Abstract: Scientific AI agents increasingly retrieve evidence, reason across sources, and synthesize conclusions used in consequential decisions. Yet, their ability to do so in high-stakes domains such as health remains unclear. We introduce SciConBench, a larg...
Automated Mediator for Human Negotiation: Pre-Mediation via a Structured LLM Pipeline
arXiv:2606.11379v1 Announce Type: new
Abstract: Pre-mediation, the preparatory phase preceding direct human negotiation, plays a critical role in achieving mutually beneficial agreements, yet is often omitted due to cost, time, and limited access to trained mediators. We introduce an automated medi...
Knowing When to Ask: Self-Gated Clarification for Hierarchical Language Agents
arXiv:2606.11349v1 Announce Type: new
Abstract: In hierarchical reasoning, failures often originate at intermediate decision points where the agent commits to a wrong branch without recognizing that it lacks critical information. Rather than treating clarification as an external uncertainty trigger...
How an astrophysicist uses Codex to help simulate black holes
Discover how astrophysicist Chi-kwan Chan uses Codex to build black hole simulations, helping scientists study extreme physics and test Einstein’s theory of general relativity.
AI Weekly Issue #502: Your AI can now spend your money — Visa wired it into ChatGPT
Visa just wired ChatGPT to shop and pay on your behalf — an AI agent can now buy at any Visa merchant without you clicking "buy." It capped a week where the labs pushed autonomy and capital to new highs: Anthropic put Claude Fable 5, its most powerful public model, into everyone's hands; Jeff Bezos ...
Supporting Europe’s work in ensuring a trustworthy AI ecosystem
OpenAI supports the EU Code of Practice on AI content transparency, advancing provenance standards and tools to help people understand AI-generated content.
xAI fired an engineer who raised alarms about Grok safety, new lawsuit claims
A former xAI engineer is suing the company and SpaceX, alleging he was fired for raising AI safety concerns about Grok days before SpaceX's historic IPO.
A Coding Implementation on Microsoft SkillOpt for Instrumented Prompt Optimization, Skill Evolution Analysis, and Baseline Comparison
We implement an instrumented workflow for Microsoft SkillOpt end to end. We set up the repository, connect OpenAI-compatible model access, and configure the optimizer and target models. We evaluate the original seed skill as a baseline, then run a real optimization loop with rollout, reflection, agg...