AI Tools & Frameworks Directory

Discover essential tools, libraries, and frameworks to power your AI workflows.

Tool• Apr 16, 2026

Numerical Instability and Chaos: Quantifying the Unpredictability of Large Language Models

arXiv:2604.13206v1 Announce Type: new Abstract: As Large Language Models (LLMs) are increasingly integrated into agentic workflows, their unpredictability stemming from numerical instability has emerged as a critical reliability issue. While recent studies have demonstrated the significant downstre...

#ArXiv#Machine Learning#Academic

Tool• Apr 16, 2026

SciFi: A Safe, Lightweight, User-Friendly, and Fully Autonomous Agentic AI Workflow for Scientific Applications

arXiv:2604.13180v1 Announce Type: new Abstract: Recent advances in agentic AI have enabled increasingly autonomous workflows, but existing systems still face substantial challenges in achieving reliable deployment in real-world scientific research. In this work, we present a safe, lightweight, and ...

#ArXiv#Machine Learning#Academic

Tool• Apr 16, 2026

Exploration and Exploitation Errors Are Measurable for Language Model Agents

arXiv:2604.13151v1 Announce Type: new Abstract: Language Model (LM) agents are increasingly used in complex open-ended decision-making tasks, from AI coding to physical AI. A core requirement in these settings is the ability to both explore the problem space and exploit acquired knowledge effective...

#ArXiv#Machine Learning#Academic

Tool• Apr 16, 2026

Design Conditions for Intra-Group Learning of Sequence-Level Rewards: Token Gradient Cancellation

arXiv:2604.13088v1 Announce Type: new Abstract: In sparse termination rewards, intra-group comparisons have become the dominant paradigm for fine-tuning reasoning models via reinforcement learning. However, long-term training often leads to issues like ineffective update accumulation (learning tax)...

#ArXiv#Machine Learning#Academic

Tool• Apr 16, 2026

Adaptive Memory Crystallization for Autonomous AI Agent Learning in Dynamic Environments

arXiv:2604.13085v1 Announce Type: new Abstract: Autonomous AI agents operating in dynamic environments face a persistent challenge: acquiring new capabilities without erasing prior knowledge. We present Adaptive Memory Crystallization (AMC), a memory architecture for progressive experience consolid...

#ArXiv#Machine Learning#Academic

Tool• Apr 16, 2026

The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior

arXiv:2604.13082v1 Announce Type: new Abstract: Grokking in transformers trained on algorithmic tasks is characterized by a long delay between training-set fit and abrupt generalization, but the source of that delay remains poorly understood. In encoder-decoder arithmetic models, we argue that this...

#ArXiv#Machine Learning#Academic

Tool• Apr 16, 2026

Sparse Goodness: How Selective Measurement Transforms Forward-Forward Learning

arXiv:2604.13081v1 Announce Type: new Abstract: The Forward-Forward (FF) algorithm is a biologically plausible alternative to backpropagation that trains neural networks layer by layer using a local goodness function to distinguish positive from negative data. Since its introduction, sum-of-squares...

#ArXiv#Machine Learning#Academic

Tool• Apr 16, 2026

MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining

This paper was accepted at the Workshop on Navigating and Addressing Data Problems for Foundation Models (NADPFM) at ICLR 2026. Principled domain reweighting can substantially improve sample efficiency and downstream generalization; however, data-mixture optimization for multimodal pretraining remai...

#Apple#On-device AI

Tool• Apr 15, 2026

This simple change stops robot swarms from getting stuck

In crowded environments, more robots don’t always mean faster results—in fact, too many can bring everything to a standstill. Harvard researchers discovered a surprising fix: adding a bit of randomness to how robots move can actually prevent gridlock and boost efficiency. By allowing robots to “wigg...

#Science Daily#AI#Research

Tool• Apr 15, 2026

Narrative-Driven Paper-to-Slide Generation via ArcDeck

arXiv:2604.11969v1 Announce Type: new Abstract: We introduce ArcDeck, a multi-agent framework that formulates paper-to-slide generation as a structured narrative reconstruction task. Unlike existing methods that directly summarize raw text into slides, ArcDeck explicitly models the source paper's l...

#ArXiv#Machine Learning#Academic

Tool• Apr 15, 2026

Uncertainty Quantification in CNN Through the Bootstrap of Convex Neural Networks

arXiv:2604.11833v1 Announce Type: new Abstract: Despite the popularity of Convolutional Neural Networks (CNN), the problem of uncertainty quantification (UQ) of CNN has been largely overlooked. Lack of efficient UQ tools severely limits the application of CNN in certain areas, such as medicine, whe...

#ArXiv#Machine Learning#Academic

Tool• Apr 15, 2026

Schema-Adaptive Tabular Representation Learning with LLMs for Generalizable Multimodal Clinical Reasoning

arXiv:2604.11835v1 Announce Type: new Abstract: Machine learning for tabular data remains constrained by poor schema generalization, a challenge rooted in the lack of semantic understanding of structured variables. This challenge is particularly acute in domains like clinical medicine, where electr...

#ArXiv#Machine Learning#Academic

Tool• Apr 15, 2026

When Reasoning Models Hurt Behavioral Simulation: A Solver-Sampler Mismatch in Multi-Agent LLM Negotiation

arXiv:2604.11840v1 Announce Type: new Abstract: Large language models are increasingly used as agents in social, economic, and policy simulations. A common assumption is that stronger reasoning should improve simulation fidelity. We argue that this assumption can fail when the objective is not to s...

#ArXiv#Machine Learning#Academic

Tool• Apr 15, 2026

Self-Monitoring Benefits from Structural Integration: Lessons from Metacognition in Continuous-Time Multi-Timescale Agents

arXiv:2604.11914v1 Announce Type: new Abstract: Self-monitoring capabilities -- metacognition, self-prediction, and subjective duration -- are often proposed as useful additions to reinforcement learning agents. But do they actually help? We investigate this question in a continuous-time multi-time...

#ArXiv#Machine Learning#Academic

Tool• Apr 15, 2026

GoodPoint: Learning Constructive Scientific Paper Feedback from Author Responses

arXiv:2604.11924v1 Announce Type: new Abstract: While LLMs hold significant potential to transform scientific research, we advocate for their use to augment and empower researchers rather than to automate research without human oversight. To this end, we study constructive feedback generation, the ...

#ArXiv#Machine Learning#Academic

Tool• Apr 15, 2026

Polynomial Expansion Rank Adaptation: Enhancing Low-Rank Fine-Tuning with High-Order Interactions

arXiv:2604.11841v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) is a widely used strategy for efficient fine-tuning of large language models (LLMs), but its strictly linear structure fundamentally limits expressive capacity. The bilinear formulation of weight updates captures only first-...

#ArXiv#Machine Learning#Academic

Tool• Apr 14, 2026

The Diffusion-Attention Connection

arXiv:2604.09560v1 Announce Type: new Abstract: Transformers, diffusion-maps, and magnetic Laplacians are usually treated as separate tools; we show they are all different regimes of a single Markov geometry built from pre-softmax query-scores. We define a QK "bidivergence" whose exponentiated and ...

#ArXiv#Machine Learning#Academic

Tool• Apr 14, 2026

Fairboard: a quantitative framework for equity assessment of healthcare models

arXiv:2604.09656v1 Announce Type: new Abstract: Despite there now being more than 1,000 FDA-authorised AI medical devices, formal equity assessments -- whether model performance is uniform across patient subgroups -- are rare. Here, we evaluate the equity of 18 open-source brain tumour segmentation...

#ArXiv#Machine Learning#Academic

Tool• Apr 14, 2026

Deliberative Alignment is Deep, but Uncertainty Remains: Inference time safety improvement in reasoning via attribution of unsafe behavior to base model

arXiv:2604.09665v1 Announce Type: new Abstract: While the wide adoption of refusal training in large language models (LLMs) has showcased improvements in model safety, recent works have highlighted shortcomings due to the shallow nature of these alignment methods. To this end, the work on Deliberat...

#ArXiv#Machine Learning#Academic

Tool• Apr 14, 2026

Human-like Working Memory Interference in Large Language Models

arXiv:2604.09670v1 Announce Type: new Abstract: Intelligent systems must maintain and manipulate task-relevant information online to adapt to dynamic environments and changing goals. This capacity, known as working memory, is fundamental to human reasoning and intelligence. Despite having on the or...

#ArXiv#Machine Learning#Academic

Tool• Apr 14, 2026

Belief-State RWKV for Reinforcement Learning under Partial Observability

arXiv:2604.09671v1 Announce Type: new Abstract: We propose a stronger formulation of RL on top of RWKV-style recurrent sequence models, in which the fixed-size recurrent state is explicitly interpreted as a belief state rather than an opaque hidden vector. Instead of conditioning policy and value o...

#ArXiv#Machine Learning#Academic

Tool• Apr 14, 2026

LABBench2: An Improved Benchmark for AI Systems Performing Biology Research

arXiv:2604.09554v1 Announce Type: new Abstract: Optimism for accelerating scientific discovery with AI continues to grow. Current applications of AI in scientific research range from training dedicated foundation models on scientific data to agentic autonomous hypothesis generation systems to AI-dr...

#ArXiv#Machine Learning#Academic

Tool• Apr 14, 2026

Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

arXiv:2604.09574v1 Announce Type: new Abstract: The rise of autonomous GUI agents has triggered adversarial countermeasures from digital platforms, yet existing research prioritizes utility and robustness over the critical dimension of anti-detection. We argue that for agents to survive in human-ce...

#ArXiv#Machine Learning#Academic

Tool• Apr 14, 2026

Seven simple steps for log analysis in AI systems

arXiv:2604.09563v1 Announce Type: new Abstract: AI systems produce large volumes of logs as they interact with tools and users. Analysing these logs can help understand model capabilities, propensities, and behaviours, or assess whether an evaluation worked as intended. Researchers have started dev...

#ArXiv#Machine Learning#Academic

← Prev

1 2 3 4 5 6...37