AI Tools & Frameworks Directory

Discover essential tools, libraries, and frameworks to power your AI workflows.

Tool• Mar 13, 2026

Identifying Interactions at Scale for LLMs

--> Understanding the behavior of complex machine learning systems, particularly Large Language Models (LLMs), is a critical challenge in modern artificial intelligence. Interpretability research aims to make the decision-making process more transparent to model builders and impacted humans, a ...

#Berkeley#BAIR#Academic

Tool• Mar 13, 2026

Scientists built the hardest AI test ever and the results are surprising

As AI systems began acing traditional tests, researchers realized those benchmarks were no longer tough enough. In response, nearly 1,000 experts created Humanity’s Last Exam, a massive 2,500-question challenge covering highly specialized topics across many fields. The exam was engineered so that an...

#Science Daily#AI#Research

Tool• Mar 13, 2026

Interventional Time Series Priors for Causal Foundation Models

arXiv:2603.11090v1 Announce Type: new Abstract: Prior-data fitted networks (PFNs) have emerged as powerful foundation models for tabular causal inference, yet their extension to time series remains limited by the absence of synthetic data generators that provide interventional targets. Existing tim...

#ArXiv#Machine Learning#Academic

Tool• Mar 13, 2026

Graph Tokenization for Bridging Graphs and Transformers

arXiv:2603.11099v1 Announce Type: new Abstract: The success of large pretrained Transformers is closely tied to tokenizers, which convert raw input into discrete symbols. Extending these models to graph-structured data remains a significant challenge. In this work, we introduce a graph tokenization...

#ArXiv#Machine Learning#Academic

Tool• Mar 13, 2026

DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

arXiv:2603.11076v1 Announce Type: new Abstract: Recent work synthesizes agentic tasks for post-training tool-using LLMs, yet robust generalization under shifts in tasks and toolsets remains an open challenge. We trace this brittleness to insufficient diversity in synthesized tasks. Scaling diversit...

#ArXiv#Machine Learning#Academic

Tool• Mar 13, 2026

A Survey of Reasoning in Autonomous Driving Systems: Open Challenges and Emerging Paradigms

arXiv:2603.11093v1 Announce Type: new Abstract: The development of high-level autonomous driving (AD) is shifting from perception-centric limitations to a more fundamental bottleneck, namely, a deficit in robust and generalizable reasoning. Although current AD systems manage structured environments...

#ArXiv#Machine Learning#Academic

Tool• Mar 13, 2026

PACED: Distillation at the Frontier of Student Competence

arXiv:2603.11178v1 Announce Type: new Abstract: Standard LLM distillation wastes compute on two fronts: problems the student has already mastered (near-zero gradients) and problems far beyond its reach (incoherent gradients that erode existing capabilities). We show that this waste is not merely in...

#ArXiv#Machine Learning#Academic

Tool• Mar 13, 2026

Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios

arXiv:2603.11214v1 Announce Type: new Abstract: We evaluate the autonomous cyber-attack capabilities of frontier AI models on two purpose-built cyber ranges-a 32-step corporate network attack and a 7-step industrial control system attack-that require chaining heterogeneous capabilities across exten...

#ArXiv#Machine Learning#Academic

Tool• Mar 13, 2026

Reversible Lifelong Model Editing via Semantic Routing-Based LoRA

arXiv:2603.11239v1 Announce Type: new Abstract: The dynamic evolution of real-world necessitates model editing within Large Language Models. While existing methods explore modular isolation or parameter-efficient strategies, they still suffer from semantic drift or knowledge forgetting due to conti...

#ArXiv#Machine Learning#Academic

Tool• Mar 13, 2026

mAceReason-Math: A Dataset of High-Quality Multilingual Math Problems Ready For RLVR

Reinforcement Learning with Verifiable Rewards (RLVR) has been successfully applied to significantly boost the capabilities of pretrained large language models, especially in the math and logic problem domains. However, current research and available training datasets remain English-centric. While m...

#Apple#On-device AI

Tool• Mar 13, 2026

Multilingual Reasoning Gym: Multilingual Scaling of Procedural Reasoning Environments

We present the Multilingual Reasoning Gym, an extension of Reasoning Gym (Stojanovski et al., 2025), that procedurally generates verifiable reasoning problems across 14 languages. We translate templates for 94 tasks with native-speaker validation in 10 languages and targeted code or template adaptat...

#Apple#On-device AI

Tool• Mar 12, 2026

Meta buys Moltbook: The social network where AI agents talk to each other

Meta’s acquisition of Moltbook highlights a growing focus on agent-to-agent systems and the infrastructure required to support them. It’s a small deal that signals bigger shifts in how AI ecosystems may evolve.

#AI Accelerator Institute#AI#Research

Tool• Mar 12, 2026

Systematic debugging for AI agents: Introducing the AgentRx framework

As AI agents transition from simple chatbots to autonomous systems capable of managing cloud incidents, navigating complex web interfaces, and executing multi-step API workflows, a new challenge has emerged: transparency. When a human makes a mistake, we can usually trace the logic. But when an AI a...

#Microsoft#Research

Tool• Mar 12, 2026

Explainable LLM Unlearning Through Reasoning

arXiv:2603.09980v1 Announce Type: new Abstract: LLM unlearning is essential for mitigating safety, copyright, and privacy concerns in pre-trained large language models (LLMs). Compared to preference alignment, it offers a more explicit way by removing undesirable knowledge characterized by specific...

#ArXiv#Machine Learning#Academic

Tool• Mar 12, 2026

MoE-SpAc: Efficient MoE Inference Based on Speculative Activation Utility in Heterogeneous Edge Scenarios

arXiv:2603.09983v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models enable scalable performance but face severe memory constraints on edge devices. Existing offloading strategies struggle with I/O bottlenecks due to the dynamic, low-information nature of autoregressive expert activation...

#ArXiv#Machine Learning#Academic

Tool• Mar 12, 2026

Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment

arXiv:2603.10009v1 Announce Type: new Abstract: Despite their sophisticated general-purpose capabilities, Large Language Models (LLMs) often fail to align with diverse individual preferences because standard post-training methods, like Reinforcement Learning with Human Feedback (RLHF), optimize for...

#ArXiv#Machine Learning#Academic

Tool• Mar 12, 2026

LWM-Temporal: Sparse Spatio-Temporal Attention for Wireless Channel Representation Learning

arXiv:2603.10024v1 Announce Type: new Abstract: LWM-Temporal is a new member of the Large Wireless Models (LWM) family that targets the spatiotemporal nature of wireless channels. Designed as a task-agnostic foundation model, LWM-Temporal learns universal channel embeddings that capture mobility-in...

#ArXiv#Machine Learning#Academic

Tool• Mar 12, 2026

Agentic Control Center for Data Product Optimization

arXiv:2603.10133v1 Announce Type: new Abstract: Data products enable end users to gain greater insights about their data by providing supporting assets, such as example question-SQL pairs which can be answered using the data or views over the database tables. However, producing useful data products...

#ArXiv#Machine Learning#Academic

Tool• Mar 12, 2026

Hybrid Self-evolving Structured Memory for GUI Agents

arXiv:2603.10291v1 Announce Type: new Abstract: The remarkable progress of vision-language models (VLMs) has enabled GUI agents to interact with computers in a human-like manner. Yet real-world computer-use tasks remain difficult due to long-horizon workflows, diverse interfaces, and frequent inter...

#ArXiv#Machine Learning#Academic

Tool• Mar 12, 2026

HEAL: Hindsight Entropy-Assisted Learning for Reasoning Distillation

arXiv:2603.10359v1 Announce Type: new Abstract: Distilling reasoning capabilities from Large Reasoning Models (LRMs) into smaller models is typically constrained by the limitation of rejection sampling. Standard methods treat the teacher as a static filter, discarding complex "corner-case" problems...

#ArXiv#Machine Learning#Academic

Tool• Mar 12, 2026

Beyond Scalars: Evaluating and Understanding LLM Reasoning via Geometric Progress and Stability

arXiv:2603.10384v1 Announce Type: new Abstract: Evaluating LLM reliability via scalar probabilities often fails to capture the structural dynamics of reasoning. We introduce TRACED, a framework that assesses reasoning quality through theoretically grounded geometric kinematics. By decomposing reaso...

#ArXiv#Machine Learning#Academic

Tool• Mar 12, 2026

Verbalizing LLM's Higher-order Uncertainty via Imprecise Probabilities

arXiv:2603.10396v1 Announce Type: new Abstract: Despite the growing demand for eliciting uncertainty from large language models (LLMs), empirical evidence suggests that LLM behavior is not always adequately captured by the elicitation techniques developed under the classical probabilistic uncertain...

#ArXiv#Machine Learning#Academic

Tool• Mar 12, 2026

LiTo: Surface Light Field Tokenization

We propose a 3D latent representation that jointly models object geometry and view-dependent appearance. Most prior works focus on either reconstructing 3D geometry or predicting view-independent diffuse appearance, and thus struggle to capture realistic view-dependent effects. Our approach leverage...

#Apple#On-device AI

Tool• Mar 11, 2026

RAG shows its work. That’s not the same as being right.

How GenAI turns first-party data into revenue with LLM tagging, RAG traceability, and governance that protects trust.

#AI Accelerator Institute#AI#Research

← Prev

1...11 12 13 14 15...37