AI Tools & Frameworks Directory

Discover essential tools, libraries, and frameworks to power your AI workflows.

Tool• Mar 27, 2026

Athena: Intermediate Representations for Iterative Scaffolded App Generation with an LLM

It is challenging to generate the code for a complete user interface using a Large Language Model (LLM). User interfaces are complex and their implementations often consist of multiple, inter-related files that together specify the contents of each screen, the navigation flows between the screens, a...

#Apple#On-device AI

Tool• Mar 27, 2026

To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models

State Space Models (SSMs) have become the leading alternative to Transformers for sequence modeling. Their primary advantage is efficiency in long-context and long-form generation, enabled by fixed-size memory and linear scaling of computational complexity. We begin this work by showing a simple the...

#Apple#On-device AI

Tool• Mar 26, 2026

AsgardBench: A benchmark for visually grounded interactive planning

Imagine a robot tasked with cleaning a kitchen. It needs to observe its environment, decide what to do, and adjust when things don’t go as expected, for example, when the mug it was tasked to wash is already clean, or the sink is full of other items. This is the domain of embodied AI: systems […] Th...

#Microsoft#Research

Tool• Mar 26, 2026

GroundedPlanBench: Spatially grounded long-horizon task planning for robot manipulation

Vision-language models (VLMs) use images and text to plan robot actions, but they still struggle to decide what actions to take and where to take them. Most systems split these decisions into two steps: a VLM generates a plan in natural language, and a separate model translates it into executable ac...

#Microsoft#Research

Tool• Mar 26, 2026

When multi-agent AI systems fail, who takes the blame?

All AI systems can fail, but now we can trace exactly who’s responsible. Implicit Execution Tracing (IET) embeds invisible signatures in AI outputs, making multi-agent systems accountable, auditable, and tamper-proof.

#AI Accelerator Institute#AI#Research

Tool• Mar 26, 2026

Beyond Accuracy: Introducing a Symbolic-Mechanistic Approach to Interpretable Evaluation

arXiv:2603.23517v1 Announce Type: new Abstract: Accuracy-based evaluation cannot reliably distinguish genuine generalization from shortcuts like memorization, leakage, or brittle heuristics, especially in small-data regimes. In this position paper, we argue for mechanism-aware evaluation that combi...

#ArXiv#Machine Learning#Academic

Tool• Mar 26, 2026

Implicit Turn-Wise Policy Optimization for Proactive User-LLM Interaction

arXiv:2603.23550v1 Announce Type: new Abstract: Multi-turn human-AI collaboration is fundamental to deploying interactive services such as adaptive tutoring, conversational recommendation, and professional consultation. However, optimizing these interactions via reinforcement learning is hindered b...

#ArXiv#Machine Learning#Academic

Tool• Mar 26, 2026

Upper Entropy for 2-Monotone Lower Probabilities

arXiv:2603.23558v1 Announce Type: new Abstract: Uncertainty quantification is a key aspect in many tasks such as model selection/regularization, or quantifying prediction uncertainties to perform active learning or OOD detection. Within credal approaches that consider modeling uncertainty as probab...

#ArXiv#Machine Learning#Academic

Tool• Mar 26, 2026

Synthetic Mixed Training: Scaling Parametric Knowledge Acquisition Beyond RAG

arXiv:2603.23562v1 Announce Type: new Abstract: Synthetic data augmentation helps language models learn new knowledge in data-constrained domains. However, naively scaling existing synthetic data methods by training on more synthetic tokens or using stronger generators yields diminishing returns be...

#ArXiv#Machine Learning#Academic

Tool• Mar 26, 2026

PLDR-LLMs Reason At Self-Organized Criticality

arXiv:2603.23539v1 Announce Type: new Abstract: We show that PLDR-LLMs pretrained at self-organized criticality exhibit reasoning at inference time. The characteristics of PLDR-LLM deductive outputs at criticality is similar to second-order phase transitions. At criticality, the correlation length ...

#ArXiv#Machine Learning#Academic

Tool• Mar 26, 2026

Environment Maps: Structured Environmental Representations for Long-Horizon Agents

arXiv:2603.23610v2 Announce Type: new Abstract: Although large language models (LLMs) have advanced rapidly, robust automation of complex software workflows remains an open problem. In long-horizon settings, agents frequently suffer from cascading errors and environmental stochasticity; a single mi...

#ArXiv#Machine Learning#Academic

Tool• Mar 26, 2026

Evaluating a Multi-Agent Voice-Enabled Smart Speaker for Care Homes: A Safety-Focused Framework

arXiv:2603.23625v1 Announce Type: new Abstract: Artificial intelligence (AI) is increasingly being explored in health and social care to reduce administrative workload and allow staff to spend more time on patient care. This paper evaluates a voice-enabled Care Home Smart Speaker designed to suppor...

#ArXiv#Machine Learning#Academic

Tool• Mar 26, 2026

Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments

arXiv:2603.23638v1 Announce Type: new Abstract: Large language models (LLMs) have enabled agentic systems that can reason, plan, and act across complex tasks, but it remains unclear whether they can allocate resources effectively under uncertainty. Unlike short-horizon reactive decisions, allocatio...

#ArXiv#Machine Learning#Academic

Tool• Mar 26, 2026

GTO Wizard Benchmark

arXiv:2603.23660v1 Announce Type: new Abstract: We introduce GTO Wizard Benchmark, a public API and standardized evaluation framework for benchmarking algorithms in Heads-Up No-Limit Texas Hold'em (HUNL). The benchmark evaluates agents against GTO Wizard AI, a state-of-the-art superhuman poker agen...

#ArXiv#Machine Learning#Academic

Tool• Mar 26, 2026

Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training

While scaling laws for Large Language Models (LLMs) traditionally focus on proxy metrics like pretraining loss, predicting downstream task performance has been considered unreliable. This paper challenges that view by proposing a direct framework to model the scaling of benchmark performance from th...

#Apple#On-device AI

Tool• Mar 25, 2026

Beyond Hard Constraints: Budget-Conditioned Reachability For Safe Offline Reinforcement Learning

arXiv:2603.22292v1 Announce Type: new Abstract: Sequential decision making using Markov Decision Process underpins many realworld applications. Both model-based and model free methods have achieved strong results in these settings. However, real-world tasks must balance reward maximization with saf...

#ArXiv#Machine Learning#Academic

Tool• Mar 25, 2026

Efficient Embedding-based Synthetic Data Generation for Complex Reasoning Tasks

arXiv:2603.22294v1 Announce Type: new Abstract: Synthetic Data Generation (SDG), leveraging Large Language Models (LLMs), has recently been recognized and broadly adopted as an effective approach to improve the performance of smaller but more resource and compute efficient LLMs through fine-tuning....

#ArXiv#Machine Learning#Academic

Tool• Mar 25, 2026

Between the Layers Lies the Truth: Uncertainty Estimation in LLMs Using Intra-Layer Local Information Scores

arXiv:2603.22299v1 Announce Type: new Abstract: Large language models (LLMs) are often confidently wrong, making reliable uncertainty estimation (UE) essential. Output-based heuristics are cheap but brittle, while probing internal representations is effective yet high-dimensional and hard to transf...

#ArXiv#Machine Learning#Academic

Tool• Mar 25, 2026

Scaling Attention via Feature Sparsity

arXiv:2603.22300v1 Announce Type: new Abstract: Scaling Transformers to ultra-long contexts is bottlenecked by the $O(n^2 d)$ cost of self-attention. Existing methods reduce this cost along the sequence axis through local windows, kernel approximations, or token-level sparsity, but these approaches...

#ArXiv#Machine Learning#Academic

Tool• Mar 25, 2026

Latent Semantic Manifolds in Large Language Models

arXiv:2603.22301v1 Announce Type: new Abstract: Large Language Models (LLMs) perform internal computations in continuous vector spaces yet produce discrete tokens -- a fundamental mismatch whose geometric consequences remain poorly understood. We develop a mathematical framework that interprets LLM...

#ArXiv#Machine Learning#Academic

Tool• Mar 25, 2026

The Efficiency Attenuation Phenomenon: A Computational Challenge to the Language of Thought Hypothesis

arXiv:2603.22312v1 Announce Type: new Abstract: This paper computationally investigates whether thought requires a language-like format, as posited by the Language of Thought (LoT) hypothesis. We introduce the ``AI Private Language'' thought experiment: if two artificial agents develop an efficient...

#ArXiv#Machine Learning#Academic

Tool• Mar 25, 2026

Intelligence Inertia: Physical Principles and Applications

arXiv:2603.22347v1 Announce Type: new Abstract: While Landauer's principle establishes the fundamental thermodynamic floor for information erasure and Fisher Information provides a metric for local curvature in parameter space, these classical frameworks function effectively only as approximations ...

#ArXiv#Machine Learning#Academic

Tool• Mar 25, 2026

Session Risk Memory (SRM): Temporal Authorization for Deterministic Pre-Execution Safety Gates

arXiv:2603.22350v1 Announce Type: new Abstract: Deterministic pre-execution safety gates evaluate whether individual agent actions are compatible with their assigned roles. While effective at per-action authorization, these systems are structurally blind to distributed attacks that decompose harmfu...

#ArXiv#Machine Learning#Academic

Tool• Mar 25, 2026

Thinking into the Future: Latent Lookahead Training for Transformers

This paper was accepted at the Workshop on Latent & Implicit Thinking – Going Beyond CoT Reasoning 2026 at ICLR. Autoregressive language models trained with next-token prediction generate text by sampling one discrete token at a time. Although very scalable, this objective forces the model to commit...

#Apple#On-device AI

← Prev

1...7 8 9 10 11...37