AI chatbots are giving out people’s real phone numbers
People report that their personal contact info was surfaced by Google AI—and there’s apparently no easy way to prevent it. A Redditor recently wrote that he was “desperate for help”: for about a month, he said, his phone had been inundated by calls from “strangers” who were “looking for a lawyer, a...
mimalloc: A high-performance, scalable memory allocator for the modern era
mimalloc is an open-source, modern, scalable memory allocator that is a drop-in replacement for malloc and free. It is relatively small (~12K lines), with clear internal data structures, and is easy to build and integrate into other projects. It provides bounded worst-case allocation times (up to OS...
Ryan Carson has built companies for 25 years, including Treehouse, which taught over a million people to code. He knows what it takes to grow a team. So when he told me he’d raised $2 million in seed funding for his latest company, Untangle, an AI-powered divorce assistant, and had no plans to hire ...
GridSFM: A new, small foundation model for the electric grid
Introducing GridSFM, a small foundation model that can predict AC optimal power flow in milliseconds, boosting efficiency and unlocking cost savings. Learn how GridSFM gives grid operators direct visibility into congestion, stability, and system health.
The post GridSFM: A new, small foundation mode...
AI Reality Check: The Problem With AI Evaluation: Garbage In, Gospel Out
AI models are judged by flawed benchmarks that distort progress and reliability. Week 12 exposes why AI evaluation is broken — and what must replace it.
Introducing the 6 stages at TechCrunch Disrupt 2026 — built for today’s tougher startup market
From October 13-15, TechCrunch Disrupt 2026 will feature 200+ sessions across six stages, led by 250+ tech leaders shaping the industry today. Register now to save up to $410, plus 50% off a second pass.
Poppy debuts a proactive AI assistant to help organize your digital life
Poppy is an AI-powered app that connects your calendar, email, messages, and other services to surface reminders, suggestions, and tasks based on what’s happening in your life.
NVIDIA, Ineffable Intelligence Team Up to Build the Future of Reinforcement Learning Infrastructure
Reinforcement-learning agents — AI systems that learn by trial and error — can convert computation into new knowledge. That’s the focus of a new engineering-level collaboration between NVIDIA and Ineffable Intelligence, the London-based AI lab founded by AlphaGo architect David Silver in the wake of...
Hermes Unlocks Self-Improving AI Agents, Powered by NVIDIA RTX PCs and DGX Spark
Agentic AI is changing the way users get work done. Following the success of OpenClaw, the community is embracing new open source agentic frameworks. The latest is Hermes Agent, which crossed 140,000 GitHub stars in under three months.
Adaption aims big with AutoScientist, an AI tool that helps models train themselves
Adaption's new AutoScientist tool is designed to let models adapt to specific capabilities quickly through an automated approach to conventional fine-tuning.
Building an Evaluation Harness for Production AI Agents: A 12-Metric Framework From 100+ Deployments
A 12-metric evaluation framework for production AI agents — covering retrieval, generation, agent behavior, and production health. Drawn from 100+ enterprise deployments.
The post Building an Evaluation Harness for Production AI Agents: A 12-Metric Framework From 100+ Deployments appeared first on T...
Mira Murati’s Thinking Machines Lab Introduces Interaction Models: A Native Multimodal Architecture for Real-Time Human-AI Collaboration
Thinking Machines Lab has introduced a research preview of TML-Interaction-Small, a 276B parameter Mixture-of-Experts model with 12B active parameters, built around a multi-stream, time-aligned micro-turn architecture that processes 200ms chunks of audio, video, and text simultaneously — eliminating...
AI for Business Forecasting: Can It Improve My Bottom Line?
Few things are more valuable in business than seeing what’s coming next. Whether predicting sales, managing inventory, or allocating resources, the ability to forecast accurately can make the difference between thriving and surviving.Traditionally, forecasting has relied on spreadsheets, historical ...
QuIDE: Mastering the Quantized Intelligence Trade-off via Active Optimization
arXiv:2605.10959v1 Announce Type: new
Abstract: There is currently no unified metric for evaluating the efficiency of quantized neural networks. We propose QuIDE, built around the Intelligence Index I = (C x P)/log_2(T+1), which collapses the compression-accuracy-latency trade-off into a single sco...
Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models
arXiv:2605.10971v1 Announce Type: new
Abstract: Discrete diffusion language models (DLMs) generate text by iteratively denoising all positions in parallel, offering an alternative to autoregressive models. Controlled generation methods for DLMs, imported from autoregressive models, apply uniform in...
Vertex-Softmax: Tight Transformer Verification via Exact Softmax Optimization
arXiv:2605.10974v1 Announce Type: new
Abstract: Certified verification of transformer attention requires bounding the softmax function over interval constraints on the pre-softmax scores. Existing verifiers relax softmax ndependently of the downstream objective, leaving avoidable slack. We prove th...
EVOCHAMBER: Test-Time Co-evolution of Multi-Agent System at Individual, Team, and Population Scales
arXiv:2605.11136v1 Announce Type: new
Abstract: We argue that multi-agent test-time evolution is not single-agent evolution replicated N times. A single-agent learner can only evolve its own context and memory. A multi-agent system additionally evolves who collaborates, how they collaborate, and ho...
OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents
arXiv:2605.11169v1 Announce Type: new
Abstract: Large language model agents interleave reasoning, action selection, and observation to solve sequential decision-making tasks. In deployed settings where agents repeatedly handle related multi-step tasks, small action-selection errors can accumulate i...
The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes
arXiv:2605.11182v1 Announce Type: new
Abstract: On-policy distillation (OPD) and on-policy self-distillation (OPSD) have emerged as promising post-training methods for large language models, offering dense token-level supervision on trajectories sampled from the model's own policy. However, existin...