AI Tools & Frameworks Directory

Discover essential tools, libraries, and frameworks to power your AI workflows.

Tool• Jan 28, 2026

Neural Theorem Proving for Verification Conditions: A Real-World Benchmark

arXiv:2601.18944v1 Announce Type: new Abstract: Theorem proving is fundamental to program verification, where the automated proof of Verification Conditions (VCs) remains a primary bottleneck. Real-world program verification frequently encounters hard VCs that existing Automated Theorem Provers (AT...

#ArXiv#Machine Learning#Academic

Tool• Jan 27, 2026

TelcoAI: Advancing 3GPP Technical Specification Search through Agentic Multi-Modal Retrieval-Augmented Generation

arXiv:2601.16984v1 Announce Type: new Abstract: The 3rd Generation Partnership Project (3GPP) produces complex technical specifications essential to global telecommunications, yet their hierarchical structure, dense formatting, and multi-modal content make them difficult to process. While Large Lan...

#ArXiv#Machine Learning#Academic

Tool• Jan 27, 2026

Sparsity-Aware Low-Rank Representation for Efficient Fine-Tuning of Large Language Models

arXiv:2601.16991v1 Announce Type: new Abstract: Adapting large pre-trained language models to downstream tasks often entails fine-tuning millions of parameters or deploying costly dense weight updates, which hinders their use in resource-constrained environments. Low-rank Adaptation (LoRA) reduces ...

#ArXiv#Machine Learning#Academic

Tool• Jan 27, 2026

MathMixup: Boosting LLM Mathematical Reasoning with Difficulty-Controllable Data Synthesis and Curriculum Learning

arXiv:2601.17006v1 Announce Type: new Abstract: In mathematical reasoning tasks, the advancement of Large Language Models (LLMs) relies heavily on high-quality training data with clearly defined and well-graded difficulty levels. However, existing data synthesis methods often suffer from limited di...

#ArXiv#Machine Learning#Academic

Tool• Jan 27, 2026

Interpreting Agentic Systems: Beyond Model Explanations to System-Level Accountability

arXiv:2601.17168v1 Announce Type: new Abstract: Agentic systems have transformed how Large Language Models (LLMs) can be leveraged to create autonomous systems with goal-directed behaviors, consisting of multi-step planning and the ability to interact with different environments. These systems diff...

#ArXiv#Machine Learning#Academic

Tool• Jan 27, 2026

High-Fidelity Longitudinal Patient Simulation Using Real-World Data

arXiv:2601.17310v1 Announce Type: new Abstract: Simulation is a powerful tool for exploring uncertainty. Its potential in clinical medicine is transformative and includes personalized treatment planning and virtual clinical trials. However, simulating patient trajectories is challenging because of ...

#ArXiv#Machine Learning#Academic

Tool• Jan 27, 2026

Phase Transition for Budgeted Multi-Agent Synergy

arXiv:2601.17311v1 Announce Type: new Abstract: Multi-agent systems can improve reliability, yet under a fixed inference budget they often help, saturate, or even collapse. We develop a minimal and calibratable theory that predicts these regimes from three binding constraints of modern agent stacks...

#ArXiv#Machine Learning#Academic

Tool• Jan 27, 2026

Principled Coarse-Grained Acceptance for Speculative Decoding in Speech

Speculative decoding accelerates autoregressive speech generation by letting a fast draft model propose tokens that a larger target model verifies. However, for speech LLMs that generate acoustic tokens, exact token matching is overly restrictive: many discrete tokens are acoustically or semanticall...

#Apple#On-device AI

Tool• Jan 27, 2026

SelfReflect: Can LLMs Communicate Their Internal Answer Distribution?

The common approach to communicate a large language model’s (LLM) uncertainty is to add a percentage number or a hedging word to its response. But is this all we can do? Instead of generating a single answer and then hedging it, an LLM that is fully transparent to the user needs to be able to reflec...

#Apple#On-device AI

Tool• Jan 27, 2026

Learning to Reason as Action Abstractions with Scalable Mid-Training RL

Large language models excel with reinforcement learning (RL), but fully unlocking this potential requires a mid-training stage. An effective mid-training phase should identify a compact set of useful actions and enable fast selection among them through online RL. We formalize this intuition by prese...

#Apple#On-device AI

Tool• Jan 27, 2026

VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety

Safety evaluation of multimodal foundation models often treats vision and language inputs separately, missing risks from joint interpretation where benign content becomes harmful in combination. Existing approaches also fail to distinguish clearly unsafe content from borderline cases, leading to pro...

#Apple#On-device AI

Tool• Jan 26, 2026

Analyzing Neural Network Information Flow Using Differential Geometry

arXiv:2601.16366v1 Announce Type: new Abstract: This paper provides a fresh view of the neural network (NN) data flow problem, i.e., identifying the NN connections that are most important for the performance of the full model, through the lens of graph theory. Understanding the NN data flow provide...

#ArXiv#Machine Learning#Academic

Tool• Jan 26, 2026

When Agents Fail to Act: A Diagnostic Framework for Tool Invocation Reliability in Multi-Agent LLM Systems

arXiv:2601.16280v1 Announce Type: new Abstract: Multi-agent systems powered by large language models (LLMs) are transforming enterprise automation, yet systematic evaluation methodologies for assessing tool-use reliability remain underdeveloped. We introduce a comprehensive diagnostic framework tha...

#ArXiv#Machine Learning#Academic

Tool• Jan 26, 2026

SemanticALLI: Caching Reasoning, Not Just Responses, in Agentic Systems

arXiv:2601.16286v1 Announce Type: new Abstract: Agentic AI pipelines suffer from a hidden inefficiency: they frequently reconstruct identical intermediate logic, such as metric normalization or chart scaffolding, even when the user's natural language phrasing is entirely novel. Conventional boundar...

#ArXiv#Machine Learning#Academic

Tool• Jan 26, 2026

DSGym: A Holistic Framework for Evaluating and Training Data Science Agents

arXiv:2601.16344v1 Announce Type: new Abstract: Data science agents promise to accelerate discovery and insight-generation by turning data into executable analyses and findings. Yet existing data science benchmarks fall short due to fragmented evaluation interfaces that make cross-benchmark compari...

#ArXiv#Machine Learning#Academic

Tool• Jan 26, 2026

Doc2AHP: Inferring Structured Multi-Criteria Decision Models via Semantic Trees with LLMs

arXiv:2601.16479v1 Announce Type: new Abstract: While Large Language Models (LLMs) demonstrate remarkable proficiency in semantic understanding, they often struggle to ensure structural consistency and reasoning reliability in complex decision-making tasks that demand rigorous logic. Although class...

#ArXiv#Machine Learning#Academic

Tool• Jan 26, 2026

SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care

arXiv:2601.16529v1 Announce Type: new Abstract: Large language models (LLMs) show promise in clinical decision support yet risk acquiescing to patient pressure for inappropriate care. We introduce SycoEval-EM, a multi-agent simulation framework evaluating LLM robustness through adversarial patient ...

#ArXiv#Machine Learning#Academic

Tool• Jan 26, 2026

AI Can’t Do Physics Well – And That’s a Roadblock to Autonomy

QuantiPhy is a new benchmark and training framework that evaluates whether AI can numerically reason about physical properties in video images. QuantiPhy reveals that today’s models struggle with basic estimates of size, speed, and distance but offers a way forward.

#Stanford#HAI#Ethics

Tool• Jan 25, 2026

Researchers tested AI against 100,000 humans on creativity

A massive new study comparing more than 100,000 people with today’s most advanced AI systems delivers a surprising result: generative AI can now beat the average human on certain creativity tests. Models like GPT-4 showed strong performance on tasks designed to measure original thinking and idea gen...

#Science Daily#AI#Research

Tool• Jan 23, 2026

Reimagining UI/UX education for AI and neuroinclusion

Design education must evolve beyond polished screens toward neuroinclusive cognition, AI-as-infrastructure, and uncertainty-aware governance.

#AI Accelerator Institute#AI#Research

Tool• Jan 23, 2026

Empowering LLMs for Structure-Based Drug Design via Exploration-Augmented Latent Inference

arXiv:2601.15333v1 Announce Type: new Abstract: Large Language Models (LLMs) possess strong representation and reasoning capabilities, but their application to structure-based drug design (SBDD) is limited by insufficient understanding of protein structures and unpredictable molecular generation. T...

#ArXiv#Machine Learning#Academic

Tool• Jan 23, 2026

Language Models Entangle Language and Culture

arXiv:2601.15337v1 Announce Type: new Abstract: Users should not be systemically disadvantaged by the language they use for interacting with LLMs; i.e. users across languages should get responses of similar quality irrespective of language used. In this work, we create a set of real-world open-ende...

#ArXiv#Machine Learning#Academic

Tool• Jan 23, 2026

FedUMM: A General Framework for Federated Learning with Unified Multimodal Models

arXiv:2601.15390v1 Announce Type: new Abstract: Unified multimodal models (UMMs) are emerging as strong foundation models that can do both generation and understanding tasks in a single architecture. However, they are typically trained in centralized settings where all training and downstream datas...

#ArXiv#Machine Learning#Academic

Tool• Jan 22, 2026

Call2Instruct: Automated Pipeline for Generating Q&A Datasets from Call Center Recordings for LLM Fine-Tuning

arXiv:2601.14263v1 Announce Type: new Abstract: The adaptation of Large-Scale Language Models (LLMs) to specific domains depends on high-quality fine-tuning datasets, particularly in instructional format (e.g., Question-Answer - Q&A). However, generating these datasets, particularly from unstructur...

#ArXiv#Machine Learning#Academic

← Prev

1...25 26 27 28 29...37