AI Tools & Frameworks Directory

Discover essential tools, libraries, and frameworks to power your AI workflows.

Tool• Dec 9, 2026

Transparency in AI is on the Decline

A new study shows the AI industry is withholding key information.

#Stanford#HAI#Ethics

Tool• Jun 12, 2026

PersonaDrive: Human-Style Retrieval-Augmented VLA Agents for Closed-Loop Driving Simulation

arXiv:2606.12616v1 Announce Type: new Abstract: Closed-loop driving simulators typically populate their environments with non-ego traffic agents that behave largely the same way, produced either by rule-based traffic managers or by learned models trained toward a single behavioral mode. Recent work...

#ArXiv#Machine Learning#Academic

Tool• Jun 12, 2026

Pythagoras-Prover: Advancing Efficient Formal Proving via Augmented Lean Formalisation

arXiv:2606.12594v1 Announce Type: new Abstract: Modern Lean theorem provers achieve strong performance only with substantial training and inference compute, driven in part by scarce verified proof data and the long reasoning traces of formal proof search, making both supervised fine-tuning (SFT) an...

#ArXiv#Machine Learning#Academic

Tool• Jun 12, 2026

Strategic Decision Support for AI Agents

arXiv:2606.12587v1 Announce Type: new Abstract: Traditionally, decision support studies how humans use machine learning models to make better decisions. In modern agentic systems, this division of roles is increasingly reversed: AI agents act on behalf of users, while humans and tools becomes suppo...

#ArXiv#Machine Learning#Academic

Tool• Jun 12, 2026

Arbor: Tree Search as a Cognition Layer for Autonomous Agents

arXiv:2606.12563v1 Announce Type: new Abstract: Arbor is a multi-agent framework that introduces structured tree search as a cognition layer for autonomous agents operating in large, stateful action spaces. Prior autonomous optimization systems operate on isolated targets with stateless evaluation....

#ArXiv#Machine Learning#Academic

Tool• Jun 12, 2026

ToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMs

arXiv:2606.12451v1 Announce Type: new Abstract: Large language models deployed as agents over large tool catalogs face a critical tool-retrieval bottleneck. As embedding-based retrieval approaches rely on compact encoders that may under-capture specialized tool semantics, parametric tool retrieval ...

#ArXiv#Machine Learning#Academic

Tool• Jun 11, 2026

The benchmark gap, explained: What AI leaderboards measure and what they miss

Every frontier model now scores above 88% on MMLU. So why does a 37% gap still exist between lab benchmark scores and real-world AI deployment performance? We explain why the tests keep lying, and what rigorous evaluation actually looks like.

#AI Accelerator Institute#AI#Research

Tool• Jun 11, 2026

From Explicit Elements to Implicit Intent: A Predefined Library for Auditable Behavioral Inference

arXiv:2606.11207v1 Announce Type: new Abstract: We present SemantiClean, a modular framework for extracting structured semantic signals from e-commerce session data and driving pluggable inference targets including purchase intent, customer segmentation, and product affinity through a shared elemen...

#ArXiv#Machine Learning#Academic

Tool• Jun 11, 2026

Restless bandits with imperfect binary feedback: PCL-indexability analysis and computation

arXiv:2606.11192v1 Announce Type: new Abstract: We study restless bandits with binary latent states and imperfect binary feedback, motivated by opportunistic spectrum access with sensing errors. For the associated belief-state model, we develop a partial conservation laws (PCL)-based analytical and...

#ArXiv#Machine Learning#Academic

Tool• Jun 11, 2026

To Intervene or Not: Guiding Inference-time Alignment with Probabilistic Model Blending

arXiv:2606.11201v1 Announce Type: new Abstract: The wide deployment of LLMs has made model alignment necessary to make newly trained models safely and effectively respond to user instructions. Among different methods, inference-time alignment is often cheaper as it intervenes (i.e., offers guidance...

#ArXiv#Machine Learning#Academic

Tool• Jun 11, 2026

Dual-Stance Evaluation of Sycophancy: The Structure of Agreement and the Limits of Intervention

arXiv:2606.11205v1 Announce Type: new Abstract: Activation steering can shift LLM behaviour, but standard evaluations do not typically test whether a sycophancy-reduction direction also suppresses agreement with factually correct statements. We introduce dual-stance evaluation, which tests both sta...

#ArXiv#Machine Learning#Academic

Tool• Jun 11, 2026

ProHiFlo: Hierarchical Flow Matching with Functional Guidance for De Novo Protein Generation

arXiv:2606.11243v1 Announce Type: new Abstract: De novo protein generation has transformative potential in therapeutic design, enzyme engineering, and synthetic biology. While diffusion-based and flow matching approaches have achieved progress, they typically operate at single resolution and lack m...

#ArXiv#Machine Learning#Academic

Tool• Jun 11, 2026

Position: Hippocampal Explicit Memory Is the Cornerstone for AGI

arXiv:2606.11245v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, raising expectations for Artificial General Intelligence (AGI). This position paper argues that integrating explicit memory is the cornerstone for advancing L...

#ArXiv#Machine Learning#Academic

Tool• Jun 11, 2026

Can AI Agents Synthesize Scientific Conclusions?

arXiv:2606.11337v1 Announce Type: new Abstract: Scientific AI agents increasingly retrieve evidence, reason across sources, and synthesize conclusions used in consequential decisions. Yet, their ability to do so in high-stakes domains such as health remains unclear. We introduce SciConBench, a larg...

#ArXiv#Machine Learning#Academic

Tool• Jun 11, 2026

Knowing When to Ask: Self-Gated Clarification for Hierarchical Language Agents

arXiv:2606.11349v1 Announce Type: new Abstract: In hierarchical reasoning, failures often originate at intermediate decision points where the agent commits to a wrong branch without recognizing that it lacks critical information. Rather than treating clarification as an external uncertainty trigger...

#ArXiv#Machine Learning#Academic

Tool• Jun 11, 2026

Automated Mediator for Human Negotiation: Pre-Mediation via a Structured LLM Pipeline

arXiv:2606.11379v1 Announce Type: new Abstract: Pre-mediation, the preparatory phase preceding direct human negotiation, plays a critical role in achieving mutually beneficial agreements, yet is often omitted due to cost, time, and limited access to trained mediators. We introduce an automated medi...

#ArXiv#Machine Learning#Academic

Tool• Jun 10, 2026

A classic brain test exposed AI's biggest weakness

Researchers gave top AI models a classic attention test used in psychology and found a major flaw. While the models could correctly name colors in short lists, their performance deteriorated sharply as the task became longer and more complex. Some leading systems fell from over 90% accuracy to nearl...

#Science Daily#AI#Research

Tool• Jun 10, 2026

Investing in multi-agent AI safety research

Google DeepMind and partners announce a $10M funding call for multi-agent safety research.

#DeepMind#Google#AGI

Tool• Jun 10, 2026

Mitigating Manifold Departure: Uncertainty-Aware Subspace Rectification for Trustworthy MLLM Decoding

arXiv:2606.09859v1 Announce Type: new Abstract: MLLMs frequently hallucinate objects inconsistent with visual inputs. This issue is typically attributed to the over-reliance on language priors, which can override the visual context. Recent training-free decoding strategies address this by penalizin...

#ArXiv#Machine Learning#Academic

Tool• Jun 10, 2026

Mechanistic Analysis of Alignment Algorithms in Language Models

arXiv:2606.09850v1 Announce Type: new Abstract: Post-training alignment algorithms are predominantly evaluated as black boxes, obscuring how they reshape language models' internal computations. We present a systematic mechanistic analysis of six preference-optimization methods: PPO, DPO, SimPO, ORP...

#ArXiv#Machine Learning#Academic

Tool• Jun 10, 2026

SynIB: Informational Bottleneck for Maximizing Synergy in Multimodal Learning

arXiv:2606.09853v1 Announce Type: new Abstract: A central objective in multimodal learning is to capture synergy: task-relevant information that arises only from the joint use of multiple modalities, and is not available from any single modality alone. While most approaches operate at the architect...

#ArXiv#Machine Learning#Academic

Tool• Jun 10, 2026

Uncertainty-aware Multi-fidelity Closure via Conditional Normalizing Flows

arXiv:2606.09857v1 Announce Type: new Abstract: Reduced-order models (ROMs) provide an efficient surrogate for complex multiscale systems, but their predictive accuracy is often compromised by truncation errors and the inadequate representation of interactions between resolved and unresolved scales...

#ArXiv#Machine Learning#Academic

Tool• Jun 10, 2026

Conformal Risk Prediction for Non-Alcoholic Fatty Liver Disease Using Gradient Boosting with Distribution-Free Coverages

arXiv:2606.09860v1 Announce Type: new Abstract: Non-alcoholic fatty liver disease (NAFLD) affects roughly 25% of global adults, posing substantial hepatic and cardiovascular risks. Yet, population-level screening tools remain inadequate. We present Method, a machine-learning framework for NAFLD ris...

#ArXiv#Machine Learning#Academic

Tool• Jun 10, 2026

From Senses to Decisions: The Information Flow of Auditory and Visual Perception in Multimodal LLMs

arXiv:2606.10147v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) can listen and see, but how do audio and visual signals actually travel through the network to shape an answer? Despite their growing role in research and real-world applications, the internal pathways through ...

#ArXiv#Machine Learning#Academic

← Prev

1 2 3...50