AI Tools & Frameworks Directory

Discover essential tools, libraries, and frameworks to power your AI workflows.

Tool• Feb 9, 2026

Jackpot: Optimal Budgeted Rejection Sampling for Extreme Actor-Policy Mismatch Reinforcement Learning

arXiv:2602.06107v1 Announce Type: new Abstract: Reinforcement learning (RL) for large language models (LLMs) remains expensive, particularly because the rollout is expensive. Decoupling rollout generation from policy optimization (e.g., leveraging a more efficient model to rollout) could enable sub...

#ArXiv#Machine Learning#Academic

Tool• Feb 9, 2026

Large Language Model Reasoning Failures

arXiv:2602.06176v1 Announce Type: new Abstract: Large Language Models (LLMs) have exhibited remarkable reasoning capabilities, achieving impressive results across a wide range of tasks. Despite these advances, significant reasoning failures persist, occurring even in seemingly simple scenarios. To ...

#ArXiv#Machine Learning#Academic

Tool• Feb 9, 2026

Do It for HER: First-Order Temporal Logic Reward Specification in Reinforcement Learning (Extended Version)

arXiv:2602.06227v1 Announce Type: new Abstract: In this work, we propose a novel framework for the logical specification of non-Markovian rewards in Markov Decision Processes (MDPs) with large state spaces. Our approach leverages Linear Temporal Logic Modulo Theories over finite traces (LTLfMT), a ...

#ArXiv#Machine Learning#Academic

Tool• Feb 9, 2026

Do LLMs Act Like Rational Agents? Measuring Belief Coherence in Probabilistic Decision Making

arXiv:2602.06286v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed as agents in high-stakes domains where optimal actions depend on both uncertainty about the world and consideration of utilities of different outcomes, yet their decision logic remains difficult t...

#ArXiv#Machine Learning#Academic

Tool• Feb 9, 2026

Exposing Weaknesses of Large Reasoning Models through Graph Algorithm Problems

arXiv:2602.06319v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) have advanced rapidly; however, existing benchmarks in mathematics, code, and common-sense reasoning remain limited. They lack long-context evaluation, offer insufficient challenge, and provide answers that are difficult ...

#ArXiv#Machine Learning#Academic

Tool• Feb 6, 2026

Denoising diffusion networks for normative modeling in neuroimaging

arXiv:2602.04886v1 Announce Type: new Abstract: Normative modeling estimates reference distributions of biological measures conditional on covariates, enabling centiles and clinically interpretable deviation scores to be derived. Most neuroimaging pipelines fit one model per imaging-derived phenoty...

#ArXiv#Machine Learning#Academic

Tool• Feb 6, 2026

A Causal Perspective for Enhancing Jailbreak Attack and Defense

arXiv:2602.04893v1 Announce Type: new Abstract: Uncovering the mechanisms behind "jailbreaks" in large language models (LLMs) is crucial for enhancing their safety and reliability, yet these mechanisms remain poorly understood. Existing studies predominantly analyze jailbreak prompts by probing lat...

#ArXiv#Machine Learning#Academic

Tool• Feb 6, 2026

Momentum Attention: The Physics of In-Context Learning and Spectral Forensics for Mechanistic Interpretability

arXiv:2602.04902v1 Announce Type: new Abstract: The Mechanistic Interpretability (MI) program has mapped the Transformer as a precise computational graph. We extend this graph with a conservation law and time-varying AC dynamics, viewing it as a physical circuit. We introduce Momentum Attention, a ...

#ArXiv#Machine Learning#Academic

Tool• Feb 6, 2026

Mind the Performance Gap: Capability-Behavior Trade-offs in Feature Steering

arXiv:2602.04903v1 Announce Type: new Abstract: Feature steering has emerged as a promising approach for controlling LLM behavior through direct manipulation of internal representations, offering advantages over prompt engineering. However, its practical effectiveness in real-world applications rem...

#ArXiv#Machine Learning#Academic

Tool• Feb 6, 2026

DCER: Dual-Stage Compression and Energy-Based Reconstruction

arXiv:2602.04904v1 Announce Type: new Abstract: Multimodal fusion faces two robustness challenges: noisy inputs degrade representation quality, and missing modalities cause prediction failures. We propose DCER, a unified framework addressing both challenges through dual-stage compression and ener...

#ArXiv#Machine Learning#Academic

Tool• Feb 6, 2026

Artificial Intelligence as Strange Intelligence: Against Linear Models of Intelligence

arXiv:2602.04986v1 Announce Type: new Abstract: We endorse and expand upon Susan Schneider's critique of the linear model of AI progress and introduce two novel concepts: "familiar intelligence" and "strange intelligence". AI intelligence is likely to be strange intelligence, defying familiar patte...

#ArXiv#Machine Learning#Academic

Tool• Feb 6, 2026

DeepRead: Document Structure-Aware Reasoning to Enhance Agentic Search

arXiv:2602.05014v1 Announce Type: new Abstract: With the rapid progress of tool-using and agentic large language models (LLMs), Retrieval-Augmented Generation (RAG) is evolving from one-shot, passive retrieval into multi-turn, decision-driven evidence acquisition. Despite strong results in open-dom...

#ArXiv#Machine Learning#Academic

Tool• Feb 6, 2026

Evaluating Large Language Models on Solved and Unsolved Problems in Graph Theory: Implications for Computing Education

arXiv:2602.05059v1 Announce Type: new Abstract: Large Language Models are increasingly used by students to explore advanced material in computer science, including graph theory. As these tools become integrated into undergraduate and graduate coursework, it is important to understand how reliably t...

#ArXiv#Machine Learning#Academic

Tool• Feb 6, 2026

Towards Reducible Uncertainty Modeling for Reliable Large Language Model Agents

arXiv:2602.05073v1 Announce Type: new Abstract: Uncertainty quantification (UQ) for large language models (LLMs) is a key building block for safety guardrails of daily LLM applications. Yet, even as LLM agents are increasingly deployed in highly complex tasks, most UQ research still centers on sing...

#ArXiv#Machine Learning#Academic

Tool• Feb 6, 2026

A New Economic World Order May Be Based on Sovereign AI and Midsized Nation Alliances

As trust in the old order erodes, mid-sized countries are building new agreements involving shared digital infrastructure and localized AI.

#Stanford#HAI#Ethics

Tool• Feb 5, 2026

Rethinking imitation learning with Predictive Inverse Dynamics Models

This research looks at why Predictive Inverse Dynamics Models often outperform standard Behavior Cloning in imitation learning. By using simple predictions of what happens next, PIDMs reduce ambiguity and learn from far fewer demonstrations. The post Rethinking imitation learning with Predictive Inv...

#Microsoft#Research

Tool• Feb 5, 2026

Paza: Introducing automatic speech recognition benchmarks and models for low resource languages

Microsoft Research unveils Paza, a human-centered speech pipeline, and PazaBench, the first leaderboard for low-resource languages. It covers 39 African languages and 52 models and is tested with communities in real settings. The post Paza: Introducing automatic speech recognition benchmarks and mo...

#Microsoft#Research

Tool• Feb 5, 2026

Understanding the Impact of Differentially Private Training on Memorization of Long-Tailed Data

arXiv:2602.03872v1 Announce Type: new Abstract: Recent research shows that modern deep learning models achieve high predictive accuracy partly by memorizing individual training samples. Such memorization raises serious privacy concerns, motivating the widespread adoption of differentially private t...

#ArXiv#Machine Learning#Academic

Tool• Feb 5, 2026

Reversible Deep Learning for 13C NMR in Chemoinformatics: On Structures and Spectra

arXiv:2602.03875v1 Announce Type: new Abstract: We introduce a reversible deep learning model for 13C NMR that uses a single conditional invertible neural network for both directions between molecular structures and spectra. The network is built from i-RevNet style bijective blocks, so the forward ...

#ArXiv#Machine Learning#Academic

Tool• Feb 5, 2026

GOPO: Policy Optimization using Ranked Rewards

arXiv:2602.03876v1 Announce Type: new Abstract: Standard reinforcement learning from human feedback (RLHF) trains a reward model on pairwise preference data and then uses it for policy optimization. However, while reward models are optimized to capture relative preferences, existing policy optimiza...

#ArXiv#Machine Learning#Academic

Tool• Feb 5, 2026

NeuroPareto: Calibrated Acquisition for Costly Many-Goal Search in Vast Parameter Spaces

arXiv:2602.03901v1 Announce Type: new Abstract: The pursuit of optimal trade-offs in high-dimensional search spaces under stringent computational constraints poses a fundamental challenge for contemporary multi-objective optimization. We develop NeuroPareto, a cohesive architecture that integrates ...

#ArXiv#Machine Learning#Academic

Tool• Feb 5, 2026

GeoIB: Geometry-Aware Information Bottleneck via Statistical-Manifold Compression

arXiv:2602.03906v1 Announce Type: new Abstract: Information Bottleneck (IB) is widely used, but in deep learning, it is usually implemented through tractable surrogates, such as variational bounds or neural mutual information (MI) estimators, rather than directly controlling the MI I(X;Z) itself. T...

#ArXiv#Machine Learning#Academic

Tool• Feb 5, 2026

Knowledge Model Prompting Increases LLM Performance on Planning Tasks

arXiv:2602.03900v1 Announce Type: new Abstract: Large Language Models (LLM) can struggle with reasoning ability and planning tasks. Many prompting techniques have been developed to assist with LLM reasoning, notably Chain-of-Thought (CoT); however, these techniques, too, have come under scrutiny as...

#ArXiv#Machine Learning#Academic

Tool• Feb 5, 2026

Enhancing Mathematical Problem Solving in LLMs through Execution-Driven Reasoning Augmentation

arXiv:2602.03950v1 Announce Type: new Abstract: Mathematical problem solving is a fundamental benchmark for assessing the reasoning capabilities of artificial intelligence and a gateway to applications in education, science, and engineering where reliable symbolic reasoning is essential. Although r...

#ArXiv#Machine Learning#Academic

← Prev

1...22 23 24 25 26...37