Google AI Releases TranslateGemma: A New Family of Open Translation Models Built on Gemma 3 with Support for 55 Languages
Google AI has released TranslateGemma, a suite of open machine translation models built on Gemma 3 and targeted at 55 languages. The family comes in 4B, 12B and 27B parameter sizes. It is designed to run across devices from mobile and edge hardware to laptops and a single H100 GPU or TPU instance in...
Social Determinants of Health Prediction for ICD-9 Code with Reasoning Models
arXiv:2601.09709v1 Announce Type: new
Abstract: Social Determinants of Health correlate with patient outcomes but are rarely captured in structured data. Recent attention has been given to automatically extracting these markers from clinical text to supplement diagnostic systems with knowledge of p...
The Geometry of Thought: Disclosing the Transformer as a Tropical Polynomial Circuit
arXiv:2601.09775v1 Announce Type: new
Abstract: We prove that the Transformer self-attention mechanism in the high-confidence regime ($\beta \to \infty$, where $\beta$ is an inverse temperature) operates in the tropical semiring (max-plus algebra). In particular, we show that taking the tropical li...
TimeSAE: Sparse Decoding for Faithful Explanations of Black-Box Time Series Models
arXiv:2601.09776v1 Announce Type: new
Abstract: As black box models and pretrained models gain traction in time series applications, understanding and explaining their predictions becomes increasingly vital, especially in high-stakes domains where interpretability and trust are essential. However, ...
arXiv:2601.09809v1 Announce Type: new
Abstract: Organizations and enterprises across domains such as healthcare, finance, and scientific research are increasingly required to extract collective intelligence from distributed, siloed datasets while adhering to strict privacy, regulatory, and sovereig...
arXiv:2601.09825v1 Announce Type: new
Abstract: We establish a lower bound on the eluder dimension of generalised linear model classes, showing that standard eluder dimension-based analysis cannot lead to first-order regret bounds. To address this, we introduce a localisation method for the eluder ...
AI Survival Stories: a Taxonomic Analysis of AI Existential Risk
arXiv:2601.09765v1 Announce Type: new
Abstract: Since the release of ChatGPT, there has been a lot of debate about whether AI systems pose an existential risk to humanity. This paper develops a general framework for thinking about the existential risk of AI systems. We analyze a two premise argumen...
GUI-Eyes: Tool-Augmented Perception for Visual Grounding in GUI Agents
arXiv:2601.09770v1 Announce Type: new
Abstract: Recent advances in vision-language models (VLMs) and reinforcement learning (RL) have driven progress in GUI automation. However, most existing methods rely on static, one-shot visual inputs and passive perception, lacking the ability to adaptively de...
PCN-Rec: Agentic Proof-Carrying Negotiation for Reliable Governance-Constrained Recommendation
arXiv:2601.09771v1 Announce Type: new
Abstract: Modern LLM-based recommenders can generate compelling ranked lists, but they struggle to reliably satisfy governance constraints such as minimum long-tail exposure or diversity requirements. We present PCN-Rec, a proof-carrying negotiation pipeline th...
Antisocial behavior towards large language model users: experimental evidence
arXiv:2601.09772v1 Announce Type: new
Abstract: The rapid spread of large language models (LLMs) has raised concerns about the social reactions they provoke. Prior research documents negative attitudes toward AI users, but it remains unclear whether such disapproval translates into costly action. W...
Improving Chain-of-Thought for Logical Reasoning via Attention-Aware Intervention
arXiv:2601.09805v1 Announce Type: new
Abstract: Modern logical reasoning with LLMs primarily relies on employing complex interactive frameworks that decompose the reasoning process into subtasks solved through carefully designed prompts or requiring external resources (e.g., symbolic solvers) to ex...
ParaRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models
Recurrent Neural Networks (RNNs) laid the foundation for sequence modeling, but their intrinsic sequential nature restricts parallel computation, creating a fundamental barrier to scaling. This has led to the dominance of parallelizable architectures like Transformers and, more recently, State Space...
The Data-Quality Illusion: Rethinking Classifier-Based Quality Filtering for LLM Pretraining
Large-scale models are pretrained on massive web-crawled datasets containing documents of mixed quality, making data filtering essential. A popular method is Classifier-based Quality Filtering (CQF), which trains a binary classifier to distinguish between pretraining data and a small, high-quality s...
NVIDIA AI Open-Sourced KVzap: A SOTA KV Cache Pruning Method that Delivers near-Lossless 2x-4x Compression
As context lengths move into tens and hundreds of thousands of tokens, the key value cache in transformer decoders becomes a primary deployment bottleneck. The cache stores keys and values for every layer and head with shape (2, L, H, T, D). For a vanilla transformer such as Llama1-65B, the cache re...
Google Antigravity: AI-First Development with This New IDE
Google Antigravity marks the beginning of the "agent-first" era, It isn't just a Copilot, it’s a platform where you stop being the typist and start being the architect.
US senators demand answers from X, Meta, Alphabet on sexualized deepfakes
In a letter to the leaders of X, Meta, Alphabet, Snap, Reddit and TikTok, several U.S. senators are demanding the companies provide proof that they have "robust protections and policies" in place, and how they plan to curb the rise of sexualized deepfakes on their platforms.
OptiMind: A small language model with optimization expertise
OptiMind is a small language model that converts business operation challenges, described naturally, into mathematical formulations that optimization software can solve. It reduces formulation time & errors & enables fast, privacy-preserving local use.
The post OptiMind: A small language model with ...
The 2026 Goal Tracker: How I Built a Data-Driven Vision Board Using Python, Streamlit, and Neon
Designing a centralized system to track daily habits and long-term goals
The post The 2026 Goal Tracker: How I Built a Data-Driven Vision Board Using Python, Streamlit, and Neon appeared first on Towards Data Science.
This list focuses on tools that streamline real workflows across data, operations, and content, not flashy demos or brittle bots. Each one earns its place by reducing manual effort while keeping humans in the loop where it actually matters.
After Italy, WhatsApp excludes Brazil from rival chatbot ban
WhatsApp is allowing AI providers to continue offering their chatbots to users in Brazil, days after the country's competition agency ordered the company to suspend its new policy that bars third-party, general-purpose chatbots from the app.
DeepSeek Engram: The Future of Memory-Augmented Language Models
If you are up to date with the recent developments of AI and LLMs, you probably have realized that a major part of the progress is still through building larger models or better computation routing. Well, what if there is one more alternate route? Along came Engram! A revolutionary method of DeepSee...
Data science hiring in 2026 looks nothing like it did three years ago. Earlier, data science roles mostly required you to analyse spreadsheets at length. But come 2026, and now companies are looking for builders of machine learning systems, GenAI pipelines, and production-grade models that have a re...