SocialReasoning-Bench: Measuring whether AI agents act in users’ best interests
Using SocialReasoning Bench, we observed a stable pattern across models—agents execute competently, but fail to consistently improve the user’s position, even with explicit instructions to optimize for user interest.
The post SocialReasoning-Bench: Measuring whether AI agents act in users’ best inte...
The Best Risk Mitigation Strategy in Data? A Single Source of Truth
Every data leader has a version of this story. A regulatory audit surfaces a metric that doesn’t match across systems. A board member catches conflicting revenue numbers in two reports presented back-to-back. An AI tool generates a recommendation based on data that hasn’t been governed since the ana...
This is the fifth article in a series on agentic engineering and AI-driven development. Read part one here, part two here, part three here, and part four here. I recently had a taste of humility with my AI-generated code. I live in Park Slope, Brooklyn, and recently I needed to get to the other side...
Don’t Automate Your Moat: Matching AI Autonomy to Risk and Competitive Stakes
I was talking to a senior engineer at a well-funded company not long ago. I asked him to walk me through a critical algorithm at the heart of their product, something that ran hundreds of times a second and directly affected customer outcomes. He paused and said, “Honestly, I’m not totally sure how ...
We tend to assume that if every part of a system behaves correctly, the system itself will behave correctly. That assumption is deeply embedded in how we design, test, and operate software. If a service returns valid responses, if dependencies are reachable, and if constraints are satisfied, then th...
Enterprise AI governance still authorizes agents as if they were stable software artifacts.They are not. An enterprise deploys a LangChain-based research agent to analyze market trends and draft internal briefs. During preproduction review, the system behaves within acceptable bounds: It routes quer...
Doug Burger, sustainability expert Amy Luers, and optimization researcher Ishai Menache examine the global emissions implications of datacenter operations, efficiency gains, and AI's potential across electrification, materials, and food systems.
The post Can we AI our way to a more sustainable world...
Artificial neurons successfully communicate with living brain cells
Engineers at Northwestern University have taken a striking leap toward merging machines with the human brain by printing artificial neurons that can actually communicate with real ones. These flexible, low-cost devices generate lifelike electrical signals capable of activating living brain cells, a ...
AI safety shifts from the model to the system level. As AI becomes agentic and tool-driven, risk emerges from complex interactions, widening the gap between evaluation and real-world behavior.
The following article was originally published on Tim O’Brien’s Medium page and is being reposted here with the author’s permission. If you’ve spent any time around AI-assisted software work, you already know the moment when the Scope Creep Kraken first puts a tentacle on the boat. The project begin...
AI is splitting in two directions. One path is controlled, restricted, and security-first. The other is open, autonomous, and scaling fast. The real question isn’t which is better, it’s what this means for trust.
We call it machine learning. But do machines actually learn?
Today's AI systems train, optimize, and scale, but real learning is something else entirely. The distinction matters more than the industry wants to admit.
I sat down with Aaron Levie at the O’Reilly AI Codecon two weeks ago. Aaron cofounded Box in 2005, and 20 years later, his company manages content for about two-thirds of the Fortune 500. Aaron is one of the few CEOs of an incumbent enterprise software company thinking deeply in public about what AI...
AI breakthrough cuts energy use by 100x while boosting accuracy
AI is consuming staggering amounts of energy—already over 10% of U.S. electricity—and the demand is only accelerating. Now, researchers have unveiled a radically more efficient approach that could slash AI energy use by up to 100× while actually improving accuracy. By combining neural networks with ...
After ChatGPT’s breakthrough, the race to define the next frontier of generative AI accelerated. One of the most talked-about innovations was OpenAI’s Sora, a text-to-video AI model that promised to transform digital content creation.
“Conviction Collapse” and the End of Software as We Know It
In “An Ordinary Evening in New Haven,” the poet Wallace Stevens wrote, “It is not in the premise that reality is a solid.” That line came to mind during a fascinating conversation with Harper Reed, which amounted to something like “It is no longer in the premise that software is a product.” Harper i...
The following article originally appeared on Medium and is being reproduced here with the author’s permission. This 2,800-word essay (a 12-minute read) is about how to survive inside the AI revolution in software development, without succumbing to the fear that swirls around all of us. It explains s...
If the last wave of AI felt like hiring a very smart intern, this one feels more like managing an entire organization that never sleeps (and occasionally argues with itself).
How to Build a General-Purpose AI Agent in 131 Lines of Python
The following article originally appeared on Hugo Bowne-Anderson’s newsletter, Vanishing Gradients, and is being republished here with the author’s permission. In this post, we’ll build two AI agents from scratch in Python. One will be a coding agent, the other a search agent. Why have I called this...