AI Tools & Frameworks Directory

Discover essential tools, libraries, and frameworks to power your AI workflows.

Tool• May 19, 2026

When Actions Disappear: Adversarial Action Removal in Self-Play Reinforcement Learning

arXiv:2605.16312v1 Announce Type: new Abstract: We study adversarial action masking in self-play reinforcement learning: an attacker selectively removes legal actions from a victim's action set. Unlike observation or action perturbations, removal eliminates decision options before the agent acts. A...

#ArXiv#Machine Learning#Academic

Tool• May 19, 2026

AgentWall: A Runtime Safety Layer for Local AI Agents

arXiv:2605.16265v1 Announce Type: new Abstract: The safety of autonomous AI agents is increasingly recognized as a critical open problem. As agents transition from passive text generators to active actors capable of executing shell commands, modifying files, calling APIs, and browsing the web, the ...

#ArXiv#Machine Learning#Academic

Tool• May 19, 2026

ANNEAL: Adapting LLM Agents via Governed Symbolic Patch Learning

arXiv:2605.16309v1 Announce Type: new Abstract: LLM-based agents can recover from individual execution errors, yet they repeatedly fail on the same fault when the underlying process knowledge--operator schemas, preconditions, and constraints--remains unrepaired. Existing self-evolving approaches ad...

#ArXiv#Machine Learning#Academic

Tool• May 19, 2026

From Prompts to Protocols: An AI Agent for Laboratory Automation

arXiv:2605.16552v1 Announce Type: new Abstract: Automating science laboratories enables faster, safer, more accurate, and more reproducible execution of protocols, accelerating the discovery and testing of new materials, drugs, and more. However, setting up and running autonomous labs requires coor...

#ArXiv#Machine Learning#Academic

Tool• May 19, 2026

Skim: Speculative Execution for Fast and Efficient Web Agents

arXiv:2605.16565v2 Announce Type: new Abstract: Skim is a speculative execution framework for web agents that exploits the predictable structure of purpose-built websites. Today's web-agent expense is not intrinsic to the tasks but a property of how agents are composed: frontier-model inference, br...

#ArXiv#Machine Learning#Academic

Tool• May 19, 2026

Scalable Uncertainty Reasoning in Knowledge Graphs

arXiv:2605.16568v1 Announce Type: new Abstract: Knowledge Graphs are pivotal for semantic data integration. The real-world data they model is often inherently uncertain. Within knowledge graphs, uncertainty manifests in three distinct levels: imprecise attribute values, probabilistic triple existen...

#ArXiv#Machine Learning#Academic

Tool• May 19, 2026

EpiCache: Episodic KV Cache Management for Long-Term Conversation on Resource-Constrained Environments

Modern large language models (LLMs) extend context lengths to millions of tokens, enabling coherent, personalized responses grounded in long conversational history. However, the Key-Value (KV) cache grows linearly with the extended dialogue history, causing the model’s memory footprint to quickly ex...

#Apple#On-device AI

Tool• May 18, 2026

AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices

arXiv:2605.15206v1 Announce Type: new Abstract: Autonomous agents powered by large language models (LLMs) are increasingly used to automate complex, multi-step tasks such as coding or web-based question answering. While remote, cloud-based agents offer scalability and ease of deployment, they raise...

#ArXiv#Machine Learning#Academic

Tool• May 18, 2026

TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination

arXiv:2605.15207v1 Announce Type: new Abstract: Multi-agent LLM systems have shown promise for complex reasoning, yet recent evaluations reveal they often underperform single-model baselines. We identify a structural failure mode in sequential fine-tuning of shared-context teams: updating one agent...

#ArXiv#Machine Learning#Academic

Tool• May 18, 2026

Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels

arXiv:2605.15208v1 Announce Type: new Abstract: Large Language Models are routinely compressed via post-training quantization to reduce inference costs and memory footprint for cloud and edge deployment, yet the impact of this compression on model quality remains poorly understood. Existing studies...

#ArXiv#Machine Learning#Academic

Tool• May 18, 2026

Mask-Morph Graph U-Net: A Generalisable Mesh-Based Surrogate for Crashworthiness Field Prediction under Large Geometric Variation

arXiv:2605.15231v1 Announce Type: new Abstract: Nonlinear finite element crash simulations are accurate but computationally expensive, limiting their use in iterative design optimisation. Machine-learning surrogate models based on graph neural networks (GNNs) offer a faster alternative. Message-pas...

#ArXiv#Machine Learning#Academic

Tool• May 18, 2026

MuteBench: Modality Unavailability Tolerance Evaluation for Incomplete Multimodal Fusion

arXiv:2605.15235v1 Announce Type: new Abstract: Multimodal physiological data powers clinical AI systems from intensive care units to wearable devices, but sensors routinely fail in practice. Two failure modes are common: modality missing, where an entire channel is absent, and within-modality miss...

#ArXiv#Machine Learning#Academic

Tool• May 18, 2026

DeepSlide: From Artifacts to Presentation Delivery

arXiv:2605.15202v1 Announce Type: new Abstract: Presentations are a primary medium for scholarly communication, yet most AI slide generators optimize the artifact (a visually plausible deck) while under-optimizing the delivery process (pacing, narrative, and presentation preparation). We present De...

#ArXiv#Machine Learning#Academic

Tool• May 18, 2026

SDOF: Taming the Alignment Tax in Multi-Agent Orchestration with State-Constrained Dispatch

arXiv:2605.15204v1 Announce Type: new Abstract: Multi-agent orchestration frameworks such as LangChain, LangGraph, and CrewAI route tasks through graph-based pipelines but do not enforce the stage constraints that govern real business processes. We present SDOF, a framework that treats multi-agent ...

#ArXiv#Machine Learning#Academic

Tool• May 18, 2026

Does Theory of Mind Improvement Really Benefit Human-AI Interactions? Empirical Findings from Interactive Evaluations

arXiv:2605.15205v1 Announce Type: new Abstract: Improving the Theory of Mind (ToM) capability of Large Language Models (LLMs) is crucial for effective social interactions between these AI models and humans. However, the existing benchmarks often measure ToM capability improvement through story-read...

#ArXiv#Machine Learning#Academic

Tool• May 18, 2026

Fair outputs, Biased Internals: Causal Potency and Asymmetry of Latent Bias in LLMs for High-Stakes Decisions

arXiv:2605.15217v1 Announce Type: new Abstract: Instruction-tuned language models exhibit behavioural fairness in high-stakes decisions while retaining biased associations in their internal representations. However, whether these suppressed representations can affect model outputs - and whether suc...

#ArXiv#Machine Learning#Academic

Tool• May 17, 2026

Gemini for Science: AI experiments and tools for a new era of discovery

A collection of science tools and experiments to expand the scale and precision of scientific exploration.

#DeepMind#Google#AGI

Tool• May 16, 2026

GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration

arXiv:2605.13848v1 Announce Type: new Abstract: Agentic LLM frameworks that rely on prompted orchestration, where the model itself determines workflow transitions, often suffer from hallucinated routing, infinite loops, and non-reproducible execution. We introduce GraphBit, an engine-orchestrated f...

#ArXiv#Machine Learning#Academic

Tool• May 16, 2026

Mixed Integer Goal Programming for Personalized Meal Optimization with User-Defined Serving Granularity

arXiv:2605.13849v1 Announce Type: new Abstract: Determining what to eat to satisfy nutritional requirements is one of the oldest optimization problems in operations research, yet existing formulations have two persistent limitations: continuous variables produce impractical fractional servings (1.7...

#ArXiv#Machine Learning#Academic

Tool• May 16, 2026

A Two-Dimensional Framework for AI Agent Design Patterns: Cognitive Function and Execution Topology

arXiv:2605.13850v1 Announce Type: new Abstract: Existing frameworks for LLM-based agent architectures describe systems from a single perspective: industry guides (Anthropic, Google, LangChain) focus on execution topology -- how data flows -- while cognitive science surveys focus on cognitive functi...

#ArXiv#Machine Learning#Academic

Tool• May 16, 2026

Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems

arXiv:2605.13851v1 Announce Type: new Abstract: Multi-agent orchestration -- in which a hidden coordinator manages specialized worker agents -- is becoming the default architecture for enterprise AI deployment, yet the safety implications of orchestrator invisibility have never been empirically tes...

#ArXiv#Machine Learning#Academic

Tool• May 16, 2026

PREPING: Building Agent Memory without Tasks

arXiv:2605.13880v1 Announce Type: new Abstract: Agent memory is typically constructed either offline from curated demonstrations or online from post-deployment interactions. However, regardless of how it is built, an agent faces a cold-start gap when first introduced to a new environment without an...

#ArXiv#Machine Learning#Academic

Tool• May 15, 2026

Further Notes on Our Recent Research on AI Delegation and Long-Horizon Reliability

Our recent paper, “LLMs Corrupt Your Documents When You Delegate”, has generated discussion about the reliability of AI systems in delegated workflows. We appreciate the interest in this work and want to clarify several important points about what the paper does—and does not—claim. The research aims...

#Microsoft#Research

Tool• May 15, 2026

Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders

arXiv:2605.13930v1 Announce Type: new Abstract: EEG foundation models achieve state-of-the-art clinical performance, yet the internal computations driving their predictions remain opaque: a barrier to clinical trust. We apply TopK Sparse Autoencoders (SAEs) across three architecturally distinct EEG...

#ArXiv#Machine Learning#Academic

← Prev

1...18 19 20 21 22...63