Stay ahead of the generative AI revolution!Join the M5B Newsletter →

AI Tools & Frameworks Directory

Discover essential tools, libraries, and frameworks to power your AI workflows.

All Engineering Hardware Jobs News Research Tools Tutorials

News AI TechCrunch Analytics Vidhya Data Science Towards Data Science Medium GenAI Textual OpenAI Google MIT Microsoft HuggingFace OpenSource Models NVIDIA GPU Enterprise ArXiv

Tool• Jul 28, 2026

Progress-conditioned Group Policy Optimization for Long-Horizon Agentic Tasks

arXiv:2607.22724v1 Announce Type: new Abstract: Group-based policy optimization has been increasingly used to train large language model (LLM) agents from sparse outcome rewards by comparing trajectories or steps within a group. However, on difficult long-horizon tasks, this comparison can suffer f...

#ArXiv#Machine Learning#Academic

Tool• Jul 28, 2026

Semalith v1.4: A Calibrated 184M Safety Classifier Achieving State-of-the-Art Prompt-Injection Detection at 44x Fewer Parameters than Llama-Guard-3-8B

arXiv:2607.22545v1 Announce Type: new Abstract: Deploying large language models in financial-services and agentic settings requires safety classifiers that simultaneously handle prompt injection, regulatory compliance, and general harm, a combination no existing open guardrail addresses in a single...

#ArXiv#Machine Learning#Academic

Tool• Jul 28, 2026

QFedPolyp: A Communication- and Inference-Efficient Federated Learning Framework for Polyp Segmentation

arXiv:2607.22743v1 Announce Type: new Abstract: Background and Objective: Automatic polyp segmentation supports computer-aided diagnosis and early colorectal cancer detec- tion. Centralized deep learning requires hospitals to share sensitive medical data, while federated learning preserves privacy ...

#ArXiv#Machine Learning#Academic

Tool• Jul 28, 2026

CausalGate: Causal Importance Distillation for Transformer Module Pruning

arXiv:2607.22720v1 Announce Type: new Abstract: Existing adaptive inference methods for Large Language Models rely on observational heuristics, such as hidden-state similarity or activation magnitudes, to drop redundant modules. However, these correlation-based metrics often fail to capture subtle,...

#ArXiv#Machine Learning#Academic

Tool• Jul 28, 2026

CORVUS: Context Optimization and Reduction Via Underlying Synchronization for LLM Coding Agents

arXiv:2607.22711v1 Announce Type: new Abstract: LLM coding agents operate by constructing trajectories that accumulate reasoning, tool calls, and results to enable multi-step decision-making. However, the conventional append-only trajectory architecture found in practice tightly couples file-read a...

#ArXiv#Machine Learning#Academic

Tool• Jul 27, 2026

Risk Is Not the Target: A Monotonic Framework for Evaluating Wildfire Operational Risk Signals

arXiv:2607.21597v1 Announce Type: new Abstract: Evaluating wildfire risk systems using standard machine-learning metrics such as F1-score or IoU is fundamentally flawed: these metrics assess event prediction accuracy, not the operational coherence of a continuous risk signal. This work proposes a n...

#ArXiv#Machine Learning#Academic

Tool• Jul 27, 2026

FlowEvo: Self-Evolving Agents through the Co-Evolution of Workflows and Executable Skills

arXiv:2607.21596v1 Announce Type: new Abstract: Large language model agents increasingly solve complex tasks by constructing inference-time workflows that combine reasoning, tool use, and code execution. While such workflows enable flexible problem solving, the useful procedures discovered during e...

#ArXiv#Machine Learning#Academic

Tool• Jul 27, 2026

Securing Multimodal AI through Internal Information Decomposition

arXiv:2607.21600v1 Announce Type: new Abstract: Multimodal large language models introduce attack surfaces absent in unimodal systems: adversaries can distribute malicious intent across modalities to evade unimodal safeguards. This motivates using cross-modal consistency as a detection signal rathe...

#ArXiv#Machine Learning#Academic

Tool• Jul 27, 2026

Transferable Latency Prediction for Fast LLM Screening on Heterogeneous Edge Devices

arXiv:2607.21602v1 Announce Type: new Abstract: Accurate latency prediction is critical for deploying large language models (LLMs) on heterogeneous edge devices, where inference latency is affected by model architecture, prompt behavior, runtime backend, hardware utilization, dynamic voltage and fr...

#ArXiv#Machine Learning#Academic

Tool• Jul 27, 2026

Measuring the Dependency Gap: Diagnosing Inter-Column Fidelity in Tabular Generative Models

arXiv:2607.21636v1 Announce Type: new Abstract: Synthetic tabular data is valued for preserving not only each column's marginal distribution but the dependencies between columns -- structure that carries much of the discriminative signal for minority classes in imbalanced domains such as fraud and ...

#ArXiv#Machine Learning#Academic

Tool• Jul 27, 2026

Toward User-Conditioned Evaluation of Personal LLM Agents under Temporal Interventions

arXiv:2607.21635v1 Announce Type: new Abstract: Personal agents maintain memories, learned skills, tool configurations, and policy state that evolve with each user. Existing agent benchmarks often evaluate these capabilities in isolation: tool benchmarks test invocation under fixed APIs, memory ben...

#ArXiv#Machine Learning#Academic

Tool• Jul 27, 2026

On the Depth Scalability of Logic Gate Networks

arXiv:2607.21633v1 Announce Type: new Abstract: Logic Gate Networks (LGNs) implement computation through compositions of Boolean operations, yet unlike classical Boolean circuits, existing LGNs do not reliably benefit from increased depth. We identify two distinct causes: optimization collapse in d...

#ArXiv#Machine Learning#Academic

Tool• Jul 24, 2026

JAXBench: Benchmarking Autonomous TPU Kernel Optimization

arXiv:2607.20466v1 Announce Type: new Abstract: Rigorous benchmarks have driven progress in autonomous GPU kernel performance optimization by establishing a shared target to hillclimb on, but no equivalent exists for TPUs. We present JAXBench, a TPU-native benchmark suite for AI-generated kernel op...

#ArXiv#Machine Learning#Academic

Tool• Jul 24, 2026

Stochastic Sampling is Epistemically Shallow: The Dimensionality Gap Between Temperature Variation and Model Diversity in LLMs

arXiv:2607.20464v1 Announce Type: new Abstract: When a language model gives different answers on repeated runs, does that variation reveal what it does not know? Self-consistency turns the variation into a per-question uncertainty estimate via majority voting. But does the same variation reveal cro...

#ArXiv#Machine Learning#Academic

Tool• Jul 24, 2026

ClickGuard: Detecting and Spoiling Clickbait News with Informativeness Measures and Large Language Models

arXiv:2607.20463v1 Announce Type: new Abstract: This paper presents an AI-driven browser extension that identifies clickbait to help users avoid misleading Internet articles. Moving beyond traditional detection, the application employs a hybrid machine learning architecture that combines transforme...

#ArXiv#Machine Learning#Academic

Tool• Jul 24, 2026

Marking the Wrong Symptoms: Evaluating LLM Watermarks in Medical Texts

arXiv:2607.20462v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly integrated into clinical workflows, stressing the need for reliable traceability of model-generated output with watermarking. Yet, most watermarks are evaluated on general-purpose benchmarks, leaving domai...

#ArXiv#Machine Learning#Academic

Tool• Jul 24, 2026

AINTMA: Agentic AI Architecture for Autonomous Test Management with Generative Intelligence, Secure Cloud Communication and Adaptive Quality Analytics

arXiv:2607.20452v1 Announce Type: new Abstract: Modern software quality assurance demands intelligent, autonomous systems capable of adaptive decision-making across distributed cloud environments. This paper presents AINTMA (Agentic Intelligent Test Management Architecture), a multi-agent agentic A...

#ArXiv#Machine Learning#Academic

Tool• Jul 24, 2026

Scaling Closed-Loop Feature Channel Configuration with LLMs

arXiv:2607.20516v1 Announce Type: new Abstract: Promising initial results in closed-loop large-language-model-based channel-configuration search demonstrated that neural-network widths can be optimized directly through executable code generation and accuracy feedback. However, those results were ob...

#ArXiv#Machine Learning#Academic

Tool• Jul 24, 2026

DataPrep-Bench: Benchmarking LLMs as Training Data Preparators

arXiv:2607.20465v1 Announce Type: new Abstract: The quality of training data fundamentally determines the capabilities of large language models (LLMs), yet no unified benchmark exists to measure how well LLMs, agents, and data-centric workflows actually prepare training data end to end. We view LLM...

#ArXiv#Machine Learning#Academic

Tool• Jul 24, 2026

PhantomFill: When the Form Demands an Answer, Language Models Invent One

arXiv:2607.20492v1 Announce Type: new Abstract: Language models in production do not write prose. They fill forms: JSON fields, function arguments, extraction templates. We show that the form itself causes hallucination. We ask thirteen models the same question about the same input and change onl...

#ArXiv#Machine Learning#Academic

Tool• Jul 23, 2026

Native Multi-Dimensional Subquadratic Operators via Input Dependent Long Convolutions

arXiv:2607.19378v1 Announce Type: new Abstract: Subquadratic alternatives to attention require compromises when applied to multi-dimensional data: standard convolutions lack global receptive fields and input dependency, while recurrent models require rasterizing data such as images, volumes, and pa...

#ArXiv#Machine Learning#Academic

Tool• Jul 23, 2026

FormulaSPIN: Self-Play Fine-Tuning for Natural Language to Spreadsheet Formula Generation

arXiv:2607.19354v1 Announce Type: new Abstract: Spreadsheet applications are used by hundreds of millions worldwide, yet writing formulas remains a significant barrier. Existing approaches rely on static supervised data, which quickly saturates on limited annotations. In this paper, we introduce FO...

#ArXiv#Machine Learning#Academic

Tool• Jul 23, 2026

FineServe: A Fine-Grained Dataset and Characterization of Global LLM Serving Workloads

arXiv:2607.19349v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed as always-on online services, making efficient LLM serving a critical systems challenge. Achieving low latency and high throughput under volatile demand requires deep understanding of real-world s...

#ArXiv#Machine Learning#Academic

Tool• Jul 23, 2026

Hybrid LSTM-Graph Neural Framework for Robust Financial Fraud Detection and Adversarial Resilience

arXiv:2607.19350v1 Announce Type: new Abstract: Financial institutions face significant challenges in detecting sophisticated money laundering patterns, such as smurfing and layering, due to extreme data imbalance (0.13% fraud rate) and evolving adversarial evasion tactics. This paper proposes Frau...

#ArXiv#Machine Learning#Academic

← Prev

1 2 3...48