Getting Started with XGBoost: A Beginner-Friendly Tutorial
Among all the tools that a data scientist has, it is difficult to find one that has received a reputation as an effective and trustworthy tool like XGBoost. It was even mentioned in the winning solution of machine learning competitions on a site such as Kaggle, which you have probably visited. This ...
How to Integrate Universal Commerce Protocol (UCP) with AI Agents?
Agentic browsing is quickly becoming mainstream. People don’t just want AI agents to research products anymore. They want agents to actually buy things for them: compare options, place orders, handle payments, and complete the entire transaction. That’s where things started to break. Today’s commerc...
We Tried 5 Missing Data Imputation Methods: The Simplest Method Won (Sort Of)
We tested five imputation methods with proper cross-validation and statistical testing. Mean imputation won for prediction but destroyed feature relationships.
This tutorial will guide you through the complete process of self-hosting n8n on Docker in just 5 simple steps, with detailed explanations and code samples, regardless of your technical background.
What is Model Collapse? Examples, Causes and Fixes
AI systems feel smarter than ever. They answer quickly, confidently, and with polish. But beneath that surface, something subtle is going wrong. Outputs are getting safer. Ideas are getting narrower. Surprise is disappearing – less aweful. This matters because AI is increasingly involved in how we ...
Embeddings — vector-based numerical representations of typically unstructured data like text — have been primarily popularized in the field of natural language processing (NLP).
How to Leverage Slash Commands to Code Effectively
Learn how I utilize slash commands to be a more efficient engineer
The post How to Leverage Slash Commands to Code Effectively appeared first on Towards Data Science.
AI is taking over the world. If you don’t agree to this, you need to have a look at the latest technologies presented at one of the biggest annual technology events – the CES 2026. Consumer Electronics Show, which takes place in Las Vegas, US, every year, brings forward the best of technologies bein...
Data Science Spotlight: Selected Problems from Advent of Code 2025
Hands-on walkthroughs of problems and solution approaches that power real‑world data science use cases
The post Data Science Spotlight: Selected Problems from Advent of Code 2025 appeared first on Towards Data Science.
Mastering Non-Linear Data: A Guide to Scikit-Learn’s SplineTransformer
Forget stiff lines and wild polynomials. Discover why Splines are the "Goldilocks" of feature engineering, offering the perfect balance of flexibility and discipline for non-linear data using Scikit-Learn’s SplineTransformer.
The post Mastering Non-Linear Data: A Guide to Scikit-Learn’s SplineTransf...
TDS Newsletter: December Must-Reads on GraphRAG, Data Contracts, and More
Don't miss our most popular articles of the previous month
The post TDS Newsletter: December Must-Reads on GraphRAG, Data Contracts, and More appeared first on Towards Data Science.
Most RAG demos stop at “upload a PDF and ask a question.” That proves the pipeline works. It doesn’t prove you understand it. These projects are designed to break in interesting ways. They surface bias, contradictions, forgotten context, and overconfident answers. That’s where real RAG learning star...
How to Improve the Performance of Visual Anomaly Detection Models
Apply the best methods from academia to get the most out of practical applications
The post How to Improve the Performance of Visual Anomaly Detection Models appeared first on Towards Data Science.
HNSW at Scale: Why Your RAG System Gets Worse as the Vector Database Grows
How approximate vector search silently degrades Recall—and what to do about It
The post HNSW at Scale: Why Your RAG System Gets Worse as the Vector Database Grows appeared first on Towards Data Science.
Win 2026! 9 AI Prompts to Enter Beast Mode This New Year
The beginning of a new year brings about a new sense of energy in most. One may argue that it is all psychological, as nothing changes other than the date. Agreed, to a point. Though it is psychological, the change is not just based on the onset of a “new year.” A deep-rooted logical reasoning […]
T...
Why Supply Chain is the Best Domain for Data Scientists in 2026 (And How to Learn It)
My take after 10 years in Supply Chain on why this can be an excellent playground for data scientists who want to see their skills valued.
The post Why Supply Chain is the Best Domain for Data Scientists in 2026 (And How to Learn It) appeared first on Towards Data Science.
I get the AI scare, and if I am being honest here, you should take it seriously too. The AI age is unfolding fast, and we are seeing automation enter just about every sector. Once it does, there is absolutely no argument that the entire dynamics of human roles will change. So, for most of […]
The po...
Dummy Variable Trap in Machine Learning Explained Simply
In machine learning with categorical data, it is common to encode the categories as dummy variables (sometimes called one hot encoding) to encode categories as numerical values. This is a significant step since there are many algorithms that do not operate on other things other than numbers like lin...
In this article, we retroactively analyze what I would consider the ten most consequential, broadly impactful AI storylines of 2025, and gain insight into where the field is going in 2026.
Part 2: Avoiding burnout, learning strategies and the superpower of solitude
The post The Best Data Scientists Are Always Learning appeared first on Towards Data Science.