risllt2j51gruape0syn736o0yzxy5

Blog

Jun 10
Implementing Vector Search from Scratch: A Step-by-Step Tutorial

There’s no doubt that search is one of the most fundamental problems in computing.

Jun 09
How to Optimize Language Model Size for Deployment

The rise of language models, and more specifically large language models (LLMs), has been of such a magnitude that it has permeated every aspect of modern AI applications — from chatbots and search engines to enterprise automation and coding assistants.

Jun 06
Dealing with Missing Data Strategically: Advanced Imputation Techniques in Pandas and Scikit-learn

Missing values appear more often than not in many real-world datasets.

Jun 05
Loss Functions Explained: Understand the Maths in Just 2 Minutes Each

I must say, with the ongoing hype around machine learning, a lot of people jump straight to the application side without really understanding how things work behind the scenes.

Jun 05
10 MLOps Tools for Machine Learning Practitioners to Know

Machine learning is not just about building models.

Jun 03
10 Python One-Liners That Will Simplify Feature Engineering

Feature engineering is a key process in most data analysis workflows, especially when constructing machine learning models.

Jun 02
Word Embeddings in Language Models

This post is divided into three parts; they are: • Understanding Word Embeddings • Using Pretrained Word Embeddings • Training Word2Vec with Gensim • Training Word2Vec with PyTorch • Embeddings in Transformer Models Word embeddings represent words as dense vectors in a continuous space, where semantically similar words are positioned close to each other.

May 30
A Gentle Introduction to SHAP for Tree-Based Models

Machine learning models have become increasingly sophisticated, but this complexity often comes at the cost of interpretability.

May 29
Using Quantized Models with Ollama for Application Development

Quantization is a frequently used strategy applied to production machine learning models, particularly large and complex ones, to make them lightweight by reducing the numerical precision of the model’s parameters (weights) — usually from 32-bit floating-point to lower representations like 8-bit integers.

May 28
Tokenizers in Language Models

This post is divided into five parts; they are: • Naive Tokenization • Stemming and Lemmatization • Byte-Pair Encoding (BPE) • WordPiece • SentencePiece and Unigram The simplest form of tokenization splits text into tokens based on whitespace.