VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety
Safety evaluation of multimodal foundation models often treats vision and language inputs separately, missing risks from joint interpretation where benign content becomes harmful in combination. Existing approaches also fail to distinguish clearly unsafe content from borderline cases, leading to pro...
Powering tax donations with AI powered personalized recommendations
TRUSTBANK partnered with Recursive to build Choice AI using OpenAI models, delivering personalized, conversational recommendations that simplify Furusato Nozei gift discovery. A multi-agent system helps donors navigate thousands of options and find gifts that match their preferences.
AI startup CVector raises $5M for its industrial ‘nervous system’
Industrial AI startup CVector built a brain and nervous system for big industry. Now, founders Richard Zhang and Tyler Ruggles are tasked with a bigger challenge: showing customers and investors how this AI-powered software layer translates to real savings on an industrial scale. The New York-based...
The AI Evolution of Graph Search at Netflix: From Structured Queries to Natural LanguageBy Alex Hutter and Bartosz BalukiewiczOur previous blog posts (part 1, part 2, part 3) detailed how Netflix’s Graph Search platform addresses the challenges of searching across federated data sets within Netflix’...
This crash course will take you from a complete beginner to a confident ComfyUI user, walking you through every essential concept, feature, and practical example you need to master this powerful tool.
Exploring the RAG pipeline in Cursor that powers code indexing and retrieval for coding agents
The post How Cursor Actually Indexes Your Codebase appeared first on Towards Data Science.
Microsoft announces powerful new chip for AI inference
Microsoft has announced the launch of its latest chip, the Maia 200, which the company describes as a silicon workhorse designed for scaling AI inference. The 200, which follows the company’s Maia 100 released in 2023, has been technically outfitted to run powerful AI models at faster speeds and wit...
NVIDIA Revolutionizes Climate Tech with ‘Earth-2’: The World’s First Fully Open Accelerated AI Weather Stack
For decades, predicting the weather has been the exclusive domain of massive government supercomputers running complex physics-based equations. NVIDIA has shattered that barrier with the release of the Earth-2 family of open models and tools for AI weather and climate prediction accessible to virtua...
NVIDIA Launches Earth-2 Family of Open Models — the World’s First Fully Open, Accelerated Set of Models and Tools for AI Weather
At the American Meteorological Society’s Annual Meeting, NVIDIA today unveiled a new NVIDIA Earth-2 family of open models, libraries and frameworks for weather and climate AI, offering the world’s first fully open, production-ready weather AI software stack.
Nvidia’s new AI weather models probably saw this storm coming weeks ago
Nvidia announced three new AI weather tools today. Together, they promise to improve the accuracy of weather forecasts while also making them accessible to more users.
Building Intelligent iOS Apps with Apple’s Foundation Models Framework
The iOS development world has undergone a radical change. Only a few years back, implementing AI functionalities required costly cloud APIs or, at best, on-device processing with limited capabilities. The introduction of Apple’s Foundation Models framework heralds the availability of a 3 billion par...
Synthesia hits $4B valuation, lets employees cash out
British startup Synthesia, whose AI platform helps companies create interactive training videos, has raised a $200 million Series E round of funding that brings its valuation to $4 billion — up from $2.1 billion just a year ago.
Analyzing Neural Network Information Flow Using Differential Geometry
arXiv:2601.16366v1 Announce Type: new
Abstract: This paper provides a fresh view of the neural network (NN) data flow problem, i.e., identifying the NN connections that are most important for the performance of the full model, through the lens of graph theory. Understanding the NN data flow provide...
When Agents Fail to Act: A Diagnostic Framework for Tool Invocation Reliability in Multi-Agent LLM Systems
arXiv:2601.16280v1 Announce Type: new
Abstract: Multi-agent systems powered by large language models (LLMs) are transforming enterprise automation, yet systematic evaluation methodologies for assessing tool-use reliability remain underdeveloped. We introduce a comprehensive diagnostic framework tha...
SemanticALLI: Caching Reasoning, Not Just Responses, in Agentic Systems
arXiv:2601.16286v1 Announce Type: new
Abstract: Agentic AI pipelines suffer from a hidden inefficiency: they frequently reconstruct identical intermediate logic, such as metric normalization or chart scaffolding, even when the user's natural language phrasing is entirely novel. Conventional boundar...
DSGym: A Holistic Framework for Evaluating and Training Data Science Agents
arXiv:2601.16344v1 Announce Type: new
Abstract: Data science agents promise to accelerate discovery and insight-generation by turning data into executable analyses and findings. Yet existing data science benchmarks fall short due to fragmented evaluation interfaces that make cross-benchmark compari...
Doc2AHP: Inferring Structured Multi-Criteria Decision Models via Semantic Trees with LLMs
arXiv:2601.16479v1 Announce Type: new
Abstract: While Large Language Models (LLMs) demonstrate remarkable proficiency in semantic understanding, they often struggle to ensure structural consistency and reasoning reliability in complex decision-making tasks that demand rigorous logic. Although class...
SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care
arXiv:2601.16529v1 Announce Type: new
Abstract: Large language models (LLMs) show promise in clinical decision support yet risk acquiescing to patient pressure for inappropriate care. We introduce SycoEval-EM, a multi-agent simulation framework evaluating LLM robustness through adversarial patient ...