LLM-Pruning Collection: A JAX Based Repo For Structured And Unstructured LLM Compression
Zlab Princeton researchers have released LLM-Pruning Collection, a JAX based repository that consolidates major pruning algorithms for large language models into a single, reproducible framework. It targets one concrete goal, make it easy to compare block level, layer level and weight level pruning ...
Tencent Researchers Release Tencent HY-MT1.5: A New Translation Models Featuring 1.8B and 7B Models Designed for Seamless on-Device and Cloud Deployment
Tencent Hunyuan researchers have released HY-MT1.5, a multilingual machine translation family that targets both mobile devices and cloud systems with the same training recipe and metrics. HY-MT1.5 consists of 2 translation models, HY-MT1.5-1.8B and HY-MT1.5-7B, supports mutual translation across 33 ...
The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition
arXiv:2601.00065v1 Announce Type: new
Abstract: The open-weight LLM ecosystem is increasingly defined by model composition techniques (such as weight merging, speculative decoding, and vocabulary expansion) that remix capabilities from diverse sources. A critical prerequisite for applying these met...
arXiv:2601.00084v1 Announce Type: new
Abstract: In fixed-confidence best arm identification (BAI), the objective is to quickly identify the optimal option while controlling the probability of error below a desired threshold. Despite the plethora of BAI algorithms, existing methods typically fall sh...
Reasoning in Action: MCTS-Driven Knowledge Retrieval for Large Language Models
arXiv:2601.00003v1 Announce Type: new
Abstract: Large language models (LLMs) typically enhance their performance through either the retrieval of semantically similar information or the improvement of their reasoning capabilities. However, a significant challenge remains in effectively integrating b...
arXiv:2601.00021v1 Announce Type: new
Abstract: We present a physical theory of intelligence grounded in irreversible information processing in systems constrained by conservation laws. An intelligent system is modelled as a coupled agent-environment process whose evolution transforms information i...
A multi-algorithm approach for operational human resources workload balancing in a last mile urban delivery system
arXiv:2601.00023v1 Announce Type: new
Abstract: Efficient workload assignment to the workforce is critical in last-mile package delivery systems. In this context, traditional methods of assigning package deliveries to workers based on geographical proximity can be inefficient and surely guide to an...
Quantitative Rule-Based Strategy modeling in Classic Indian Rummy: A Metric Optimization Approach
arXiv:2601.00024v1 Announce Type: new
Abstract: The 13-card variant of Classic Indian Rummy is a sequential game of incomplete information that requires probabilistic reasoning and combinatorial decision-making. This paper proposes a rule-based framework for strategic play, driven by a new hand-eva...
New research shows that AI doesn’t need endless training data to start acting more like a human brain. When researchers redesigned AI systems to better resemble biological brains, some models produced brain-like activity without any training at all. This challenges today’s data-hungry approach to AI...
DeepSeek Researchers Apply a 1967 Matrix Normalization Algorithm to Fix Instability in Hyper Connections
DeepSeek researchers are trying to solve a precise issue in large language model training. Residual connections made very deep networks trainable, hyper connections widened that residual stream, and training then became unstable at scale. The new method mHC, Manifold Constrained Hyper Connections, k...
A deep dive on data transfer bottlenecks, their identification, and their resolution with the help of NVIDIA Nsight™ Systems
The post Optimizing Data Transfer in AI/ML Workloads appeared first on Towards Data Science.
DeepSeek mHC: Stabilizing Large Language Model Training
Large AI models are scaling rapidly, with bigger architectures and longer training runs becoming the norm. As models grow, however, a fundamental training stability issue has remained unresolved. DeepSeek mHC directly addresses this problem by rethinking how residual connections behave at scale. Thi...
Recursive Language Models (RLMs): From MIT’s Blueprint to Prime Intellect’s RLMEnv for Long Horizon LLM Agents
Recursive Language Models aim to break the usual trade off between context length, accuracy and cost in large language models. Instead of forcing a model to read a giant prompt in one pass, RLMs treat the prompt as an external environment and let the model decide how to inspect it with code, then re...
A Coding Implementation to Build a Self-Testing Agentic AI System Using Strands to Red-Team Tool-Using Agents and Enforce Safety at Runtime
In this tutorial, we build an advanced red-team evaluation harness using Strands Agents to stress-test a tool-using AI system against prompt-injection and tool-misuse attacks. We treat agent safety as a first-class engineering problem by orchestrating multiple agents that generate adversarial prompt...
How AI is reshaping work and who gets to do it, according to Mercor’s CEO
Three-year-old startup Mercor has become a $10 billion middleman in AI’s data gold rush. The company connects AI labs like OpenAI and Anthropic with former employees of Goldman Sachs, McKinsey, and white-shoe law firms, paying them up to $200 an hour to share their industry expertise and train the A...
AI Quantum Intelligence & Pic of the week (2026&01&02)
What do you get when you prompt Google Gemini (Nano Banana) to generate a creative image with no limitations - no scope, no style, no purpose, no requirements at all. This "AI Pic of the week" was the answer to that question. To be determined if the same request will produce the same result in the f...
Drift Detection in Robust Machine Learning Systems
A prerequisite for long-term success of machine learning systems
The post Drift Detection in Robust Machine Learning Systems appeared first on Towards Data Science.
In 2026, here's what you can expect from the AI industry: new architectures, smaller models, world models, reliable agents, physical AI, and products designed for real-world use.
Power BI is an influential tool, shaping raw data into informative visuals and reports. With a user-friendly interface and formidable functionalities, Power BI is an invaluable platform for individuals to refine their skills through hands-on projects. By engaging in Power BI practice projects, begin...
Liquid Foundation Models (LFM 2) define a new class of small language models designed to deliver strong reasoning and instruction-following capabilities directly on edge devices. Unlike large cloud-centric LLMs, LFM 2 focuses on efficiency, low latency, and memory awareness while still maintaining c...
Applications are now open for OpenAI Grove Cohort 2, a 5-week founder program designed for individuals at any stage, from pre-idea to product. Participants receive $50K in API credits, early access to AI tools, and hands-on mentorship from the OpenAI team.
Spectral Capital Signs Agreement to Acquire Telvantis Voice Services, Inc.
Advancing Path Toward Profitable Scale and Anticipated $450 Million in 2026 Revenue Spectral Capital Corporation (“Spectral” or the “Company“), a digital infrastructure and AI-forward platform company, today announced that it has signed a Definitive Stock Purchase Agreement to acquire Telvantis Voic...