Stay ahead of the generative AI revolution!Join the M5B Newsletter →

AI Tools & Frameworks Directory

Discover essential tools, libraries, and frameworks to power your AI workflows.

All Engineering Hardware Jobs News Research Tools Tutorials

News AI TechCrunch Analytics Vidhya Data Science Towards Data Science Medium GenAI Textual OpenAI Google MIT Microsoft HuggingFace OpenSource Models NVIDIA GPU Enterprise ArXiv

Tool• Mar 5, 2026

AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis

arXiv:2603.03378v1 Announce Type: new Abstract: Large language model (LLM) agents offer a promising data-driven approach to automating Site Reliability Engineering (SRE), yet their enterprise deployment is constrained by three challenges: restricted access to proprietary data, unsafe action executi...

#ArXiv#Machine Learning#Academic

Tool• Mar 5, 2026

RADAR: Learning to Route with Asymmetry-aware DistAnce Representations

arXiv:2603.03388v1 Announce Type: new Abstract: Recent neural solvers have achieved strong performance on vehicle routing problems (VRPs), yet they mainly assume symmetric Euclidean distances, restricting applicability to real-world scenarios. A core challenge is encoding the relational features in...

#ArXiv#Machine Learning#Academic

Tool• Mar 5, 2026

Knowledge Graph and Hypergraph Transformers with Repository-Attention and Journey-Based Role Transport

arXiv:2603.03304v1 Announce Type: new Abstract: We present a concise architecture for joint training on sentences and structured data while keeping knowledge and language representations separable. The model treats knowledge graphs and hypergraphs as structured instances with role slots and encodes...

#ArXiv#Machine Learning#Academic

Tool• Mar 5, 2026

AI4S-SDS: A Neuro-Symbolic Solvent Design System via Sparse MCTS and Differentiable Physics Alignment

arXiv:2603.03686v1 Announce Type: new Abstract: Automated design of chemical formulations is a cornerstone of materials science, yet it requires navigating a high-dimensional combinatorial space involving discrete compositional choices and continuous geometric constraints. Existing Large Language M...

#ArXiv#Machine Learning#Academic

Tool• Mar 5, 2026

MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation

arXiv:2603.03680v1 Announce Type: new Abstract: Large Language Model (LLM) agents have demonstrated remarkable proficiency in learned tasks, yet they often struggle to adapt to non-stationary environments with feedback. While In-Context Learning and external memory offer some flexibility, they fail...

#ArXiv#Machine Learning#Academic

Tool• Mar 5, 2026

Mozi: Governed Autonomy for Drug Discovery LLM Agents

arXiv:2603.03655v1 Announce Type: new Abstract: Tool-augmented large language model (LLM) agents promise to unify scientific reasoning with computation, yet their deployment in high-stakes domains like drug discovery is bottlenecked by two critical barriers: unconstrained tool-use governance and po...

#ArXiv#Machine Learning#Academic

Tool• Mar 5, 2026

Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants

arXiv:2603.03565v1 Announce Type: new Abstract: Conversational shopping assistants (CSAs) represent a compelling application of agentic AI, but moving from prototype to production reveals two underexplored challenges: how to evaluate multi-turn interactions and how to optimize tightly coupled multi...

#ArXiv#Machine Learning#Academic

Tool• Mar 5, 2026

Asymmetric Goal Drift in Coding Agents Under Value Conflict

arXiv:2603.03456v1 Announce Type: new Abstract: Agentic coding agents are increasingly deployed autonomously, at scale, and over long-context horizons. Throughout an agent's lifetime, it must navigate tensions between explicit instructions, learned values, and environmental pressures, often in cont...

#ArXiv#Machine Learning#Academic

Tool• Mar 4, 2026

RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning

arXiv:2603.02215v1 Announce Type: new Abstract: Chemical reaction prediction is pivotal for accelerating drug discovery and synthesis planning. Despite advances in data-driven models, current approaches are hindered by an overemphasis on parameter and dataset scaling. Some methods coupled with eval...

#ArXiv#Machine Learning#Academic

Tool• Mar 4, 2026

ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

arXiv:2603.02216v1 Announce Type: new Abstract: Effective information seeking in multi-turn medical dialogues is critical for accurate diagnosis, especially when dealing with incomplete information. Aligning Large Language Models (LLMs) for these interactive scenarios is challenging due to the unce...

#ArXiv#Machine Learning#Academic

Tool• Mar 4, 2026

NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

arXiv:2603.02219v1 Announce Type: new Abstract: Large language models are increasingly deployed in streaming scenarios, rendering conventional post-hoc safeguards ineffective as they fail to interdict unsafe content in real-time. While streaming safeguards based on token-level supervised training c...

#ArXiv#Machine Learning#Academic

Tool• Mar 4, 2026

Is Retraining-Free Enough? The Necessity of Router Calibration for Efficient MoE Compression

arXiv:2603.02217v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models scale capacity efficiently, but their massive parameter footprint creates a deployment-time memory bottleneck. We organize retraining-free MoE compression into three paradigms - Expert Pruning, Expert Editing, and Exper...

#ArXiv#Machine Learning#Academic

Tool• Mar 4, 2026

Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

arXiv:2603.02218v1 Announce Type: new Abstract: Large language models (LLMs) make it plausible to build systems that improve through self-evolving loops, but many existing proposals are better understood as self-play and often plateau quickly. A central failure mode is that the loop synthesises mor...

#ArXiv#Machine Learning#Academic

Tool• Mar 4, 2026

Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

arXiv:2603.02239v1 Announce Type: new Abstract: The Engineering Reasoning and Instruction (ERI) benchmark is a taxonomy-driven instruction dataset designed to train and evaluate engineering-capable large language models (LLMs) and agents. This dataset spans nine engineering fields (namely: civil, m...

#ArXiv#Machine Learning#Academic

Tool• Mar 4, 2026

SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Memory Poisoning

arXiv:2603.02240v1 Announce Type: new Abstract: We present SuperLocalMemory, a local-first memory system for multi-agent AI that defends against OWASP ASI06 memory poisoning through architectural isolation and Bayesian trust scoring, while personalizing retrieval through adaptive learning-to-rank -...

#ArXiv#Machine Learning#Academic

Tool• Mar 4, 2026

Federated Inference: Toward Privacy-Preserving Collaborative and Incentivized Model Serving

arXiv:2603.02214v1 Announce Type: new Abstract: Federated Inference (FI) studies how independently trained and privately owned models can collaborate at inference time without sharing data or model parameters. While recent work has explored secure and distributed inference from disparate perspectiv...

#ArXiv#Machine Learning#Academic

Tool• Mar 3, 2026

Transit Network Design with Two-Level Demand Uncertainties: A Machine Learning and Contextual Stochastic Optimization Framework

arXiv:2603.00010v1 Announce Type: new Abstract: Transit Network Design is a well-studied problem in the field of transportation, typically addressed by solving optimization models under fixed demand assumptions. Considering the limitations of these assumptions, this paper proposes a new framework, ...

#ArXiv#Machine Learning#Academic

Tool• Mar 3, 2026

CARE: Confounder-Aware Aggregation for Reliable LLM Evaluation

arXiv:2603.00039v1 Announce Type: new Abstract: LLM-as-a-judge ensembles are the standard paradigm for scalable evaluation, but their aggregation mechanisms suffer from a fundamental flaw: they implicitly assume that judges provide independent estimates of true quality. However, in practice, LLM ju...

#ArXiv#Machine Learning#Academic

Tool• Mar 3, 2026

Attn-QAT: 4-Bit Attention With Quantization-Aware Training

arXiv:2603.00040v1 Announce Type: new Abstract: Achieving reliable 4-bit attention is a prerequisite for end-to-end FP4 computation on emerging FP4-capable GPUs, yet attention remains the main obstacle due to FP4's tiny dynamic range and attention's heavy-tailed activations. This paper presents the...

#ArXiv#Machine Learning#Academic

Tool• Mar 3, 2026

Multi-Sourced, Multi-Agent Evidence Retrieval for Fact-Checking

arXiv:2603.00267v1 Announce Type: new Abstract: Misinformation spreading over the Internet poses a significant threat to both societies and individuals, necessitating robust and scalable fact-checking that relies on retrieving accurate and trustworthy evidence. Previous methods rely on semantic and...

#ArXiv#Machine Learning#Academic

Tool• Mar 3, 2026

EmCoop: A Framework and Benchmark for Embodied Cooperation Among LLM Agents

arXiv:2603.00349v1 Announce Type: new Abstract: Real-world scenarios increasingly require multiple embodied agents to collaborate in dynamic environments under embodied constraints, as many tasks exceed the capabilities of any single agent. Recent advances in large language models (LLMs) enable hig...

#ArXiv#Machine Learning#Academic

Tool• Mar 3, 2026

TraderBench: How Robust Are AI Agents in Adversarial Capital Markets?

arXiv:2603.00285v1 Announce Type: new Abstract: Evaluating AI agents in finance faces two key challenges: static benchmarks require costly expert annotation yet miss the dynamic decision-making central to real-world trading, while LLM-based judges introduce uncontrolled variance on domain-specific ...

#ArXiv#Machine Learning#Academic

Tool• Mar 3, 2026

How Well Do Multimodal Models Reason on ECG Signals?

arXiv:2603.00312v1 Announce Type: new Abstract: While multimodal large language models offer a promising solution to the "black box" nature of health AI by generating interpretable reasoning traces, verifying the validity of these traces remains a critical challenge. Existing evaluation methods are...

#ArXiv#Machine Learning#Academic

Tool• Mar 3, 2026

DIG to Heal: Scaling General-purpose Agent Collaboration via Explainable Dynamic Decision Paths

arXiv:2603.00309v1 Announce Type: new Abstract: The increasingly popular agentic AI paradigm promises to harness the power of multiple, general-purpose large language model (LLM) agents to collaboratively complete complex tasks. While many agentic AI systems utilize predefined workflows or agent ro...

#ArXiv#Machine Learning#Academic

1...21 22 23 24 25...39