AI Tools & Frameworks Directory

Discover essential tools, libraries, and frameworks to power your AI workflows.

Tool• Mar 4, 2026

Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model

We are pleased to announce Phi-4-reasoning-vision-15B, a 15 billion parameter open‑weight multimodal reasoning model, available through Microsoft Foundry (opens in new tab), HuggingFace (opens in new tab) and GitHub (opens in new tab). Phi-4-reasoning-vision-15B is a broadly capable model that can b...

#Microsoft#Research

Tool• Mar 4, 2026

Meta: From social platforms to systems architecture heavyweight

As Meta rebuilds its technical foundations to support multi-year model lifecycles, modular architectures, and reliability-first design, it is quietly reshaping how Silicon Valley thinks about production-grade AI.

#AI Accelerator Institute#AI#Research

Tool• Mar 4, 2026

Defusing the MCP ticking time bomb

The risk of Model Context Protocol in AI agents

#AI Accelerator Institute#AI#Research

Tool• Mar 4, 2026

RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning

arXiv:2603.02215v1 Announce Type: new Abstract: Chemical reaction prediction is pivotal for accelerating drug discovery and synthesis planning. Despite advances in data-driven models, current approaches are hindered by an overemphasis on parameter and dataset scaling. Some methods coupled with eval...

#ArXiv#Machine Learning#Academic

Tool• Mar 4, 2026

ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

arXiv:2603.02216v1 Announce Type: new Abstract: Effective information seeking in multi-turn medical dialogues is critical for accurate diagnosis, especially when dealing with incomplete information. Aligning Large Language Models (LLMs) for these interactive scenarios is challenging due to the unce...

#ArXiv#Machine Learning#Academic

Tool• Mar 4, 2026

Is Retraining-Free Enough? The Necessity of Router Calibration for Efficient MoE Compression

arXiv:2603.02217v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models scale capacity efficiently, but their massive parameter footprint creates a deployment-time memory bottleneck. We organize retraining-free MoE compression into three paradigms - Expert Pruning, Expert Editing, and Exper...

#ArXiv#Machine Learning#Academic

Tool• Mar 4, 2026

Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

arXiv:2603.02218v1 Announce Type: new Abstract: Large language models (LLMs) make it plausible to build systems that improve through self-evolving loops, but many existing proposals are better understood as self-play and often plateau quickly. A central failure mode is that the loop synthesises mor...

#ArXiv#Machine Learning#Academic

Tool• Mar 4, 2026

NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

arXiv:2603.02219v1 Announce Type: new Abstract: Large language models are increasingly deployed in streaming scenarios, rendering conventional post-hoc safeguards ineffective as they fail to interdict unsafe content in real-time. While streaming safeguards based on token-level supervised training c...

#ArXiv#Machine Learning#Academic

Tool• Mar 4, 2026

Federated Inference: Toward Privacy-Preserving Collaborative and Incentivized Model Serving

arXiv:2603.02214v1 Announce Type: new Abstract: Federated Inference (FI) studies how independently trained and privately owned models can collaborate at inference time without sharing data or model parameters. While recent work has explored secure and distributed inference from disparate perspectiv...

#ArXiv#Machine Learning#Academic

Tool• Mar 4, 2026

Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

arXiv:2603.02239v1 Announce Type: new Abstract: The Engineering Reasoning and Instruction (ERI) benchmark is a taxonomy-driven instruction dataset designed to train and evaluate engineering-capable large language models (LLMs) and agents. This dataset spans nine engineering fields (namely: civil, m...

#ArXiv#Machine Learning#Academic

Tool• Mar 4, 2026

SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Memory Poisoning

arXiv:2603.02240v1 Announce Type: new Abstract: We present SuperLocalMemory, a local-first memory system for multi-agent AI that defends against OWASP ASI06 memory poisoning through architectural isolation and Bayesian trust scoring, while personalizing retrieval through adaptive learning-to-rank -...

#ArXiv#Machine Learning#Academic

Tool• Mar 4, 2026

What Your Phone Knows Could Help Scientists Understand Your Health

Stanford scientists have released an open-source platform that lets health researchers study the “screenome” – the digital traces of our daily lives – while protecting participants’ privacy.

#Stanford#HAI#Ethics

Tool• Mar 3, 2026

Transit Network Design with Two-Level Demand Uncertainties: A Machine Learning and Contextual Stochastic Optimization Framework

arXiv:2603.00010v1 Announce Type: new Abstract: Transit Network Design is a well-studied problem in the field of transportation, typically addressed by solving optimization models under fixed demand assumptions. Considering the limitations of these assumptions, this paper proposes a new framework, ...

#ArXiv#Machine Learning#Academic

Tool• Mar 3, 2026

CARE: Confounder-Aware Aggregation for Reliable LLM Evaluation

arXiv:2603.00039v1 Announce Type: new Abstract: LLM-as-a-judge ensembles are the standard paradigm for scalable evaluation, but their aggregation mechanisms suffer from a fundamental flaw: they implicitly assume that judges provide independent estimates of true quality. However, in practice, LLM ju...

#ArXiv#Machine Learning#Academic

Tool• Mar 3, 2026

Attn-QAT: 4-Bit Attention With Quantization-Aware Training

arXiv:2603.00040v1 Announce Type: new Abstract: Achieving reliable 4-bit attention is a prerequisite for end-to-end FP4 computation on emerging FP4-capable GPUs, yet attention remains the main obstacle due to FP4's tiny dynamic range and attention's heavy-tailed activations. This paper presents the...

#ArXiv#Machine Learning#Academic

Tool• Mar 3, 2026

Multi-Sourced, Multi-Agent Evidence Retrieval for Fact-Checking

arXiv:2603.00267v1 Announce Type: new Abstract: Misinformation spreading over the Internet poses a significant threat to both societies and individuals, necessitating robust and scalable fact-checking that relies on retrieving accurate and trustworthy evidence. Previous methods rely on semantic and...

#ArXiv#Machine Learning#Academic

Tool• Mar 3, 2026

TraderBench: How Robust Are AI Agents in Adversarial Capital Markets?

arXiv:2603.00285v1 Announce Type: new Abstract: Evaluating AI agents in finance faces two key challenges: static benchmarks require costly expert annotation yet miss the dynamic decision-making central to real-world trading, while LLM-based judges introduce uncontrolled variance on domain-specific ...

#ArXiv#Machine Learning#Academic

Tool• Mar 3, 2026

DIG to Heal: Scaling General-purpose Agent Collaboration via Explainable Dynamic Decision Paths

arXiv:2603.00309v1 Announce Type: new Abstract: The increasingly popular agentic AI paradigm promises to harness the power of multiple, general-purpose large language model (LLM) agents to collaboratively complete complex tasks. While many agentic AI systems utilize predefined workflows or agent ro...

#ArXiv#Machine Learning#Academic

Tool• Mar 3, 2026

How Well Do Multimodal Models Reason on ECG Signals?

arXiv:2603.00312v1 Announce Type: new Abstract: While multimodal large language models offer a promising solution to the "black box" nature of health AI by generating interpretable reasoning traces, verifying the validity of these traces remains a critical challenge. Existing evaluation methods are...

#ArXiv#Machine Learning#Academic

Tool• Mar 3, 2026

EmCoop: A Framework and Benchmark for Embodied Cooperation Among LLM Agents

arXiv:2603.00349v1 Announce Type: new Abstract: Real-world scenarios increasingly require multiple embodied agents to collaborate in dynamic environments under embodied constraints, as many tasks exceed the capabilities of any single agent. Recent advances in large language models (LLMs) enable hig...

#ArXiv#Machine Learning#Academic

Tool• Mar 3, 2026

On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment

With the increased deployment of large language models (LLMs), one concern is their potential misuse for generating harmful content. Our work studies the alignment challenge, with a focus on filters to prevent the generation of unsafe information. Two natural points of intervention are the filtering...

#Apple#On-device AI

Tool• Mar 3, 2026

How a HAI Seed Grant Helped Launch a Disease-Fighting AI Platform

Stanford scientists in Senegal hunting for schistosomiasis—a parasitic disease infecting 200+ million people worldwide—used AI to transform local field work into satellite-powered disease mapping.

#Stanford#HAI#Ethics

Tool• Mar 3, 2026

Learning to Reason for Hallucination Span Detection

Large language models (LLMs) often generate hallucinations — unsupported content that undermines reliability. While most prior works frame hallucination detection as a binary task, many real-world applications require identifying hallucinated spans, which is a multi-step decision making process. Thi...

#Apple#On-device AI

Tool• Mar 2, 2026

Detoxifying LLMs via Representation Erasure-Based Preference Optimization

arXiv:2602.23391v1 Announce Type: new Abstract: Large language models (LLMs) trained on webscale data can produce toxic outputs, raising concerns for safe deployment. Prior defenses, based on applications of DPO, NPO, and similar algorithms, reduce the likelihood of harmful continuations, but not r...

#ArXiv#Machine Learning#Academic

← Prev

1...28 29 30 31 32...51