Working with Large Language Models

Large language models are experiencing their Cambrian explosion. They may not be the path to AGI, but at least they give a taste of what it could be. The current mainstream approach to rely on scaling, may not be the sole solution, as the data runs out, and the models are plateauing in benchmarks, though we may very well see emergent behaviors that surprise us, as the bitter lesson taught us. In this article, I first review Transformers model, then summarize useful workflows with LLMs, which I intend to keep updating.
Read more →

SIMD Programming and Vector Optimizations

Essentially all modern processors have the capability to apply instructions on a vector in one processing unit cycle instead of operating on a single scalar. Language designers and compiler developers have been trying hard to leverage these hardware capabilities by compiling scalar programs into vector instructions. One possible approach is using SIMD (Single Instruction, Multiple Data) intrinsics, supported by all modern C/C++ compilers, through SSE (Streaming SIMD Extension), AVX (Advanced Vector Extensions) others for x86 architectures, and ARM NEON extensions.
Read more →

Tiled Matrix Multiplication

Tiled algorithms, in general, divide the problem into smaller, manageable tiles that fit into faster, but limited-size memory, be it cache, shared memory, or registers, to improve memory access patterns. Many problems which often involve matrices, can be broken down into tiles. This improves cache utilization which improves cache hit/miss ratio, by reusing data within smaller subproblems. In this post we implement tiled matrix multiplication.
Read more →

World Foundation Models

At CES 2025 event, NVIDIA announced, among other interesting things, including a single chip made out of72 Blackwell GPUs, with 1.5 ExaFLOPS performance, it’s first world foundation model, Cosmos. DeepMind is working on a similar model, and this is also one of xAI’s core missions. I suspect that these models are among the next frontier in scientific and AI research, following AlphaFold’s breakthroughs in protein folding prediction.
Read more →

Spack Package Manager

Recently I gave a talk/mini workshop on spack package manager at SKAO, for which I dug into the core of spack. Spack boasts itself as the package manager for HPC, and it really is. The idea is straightforward, streamline what we normally do when compiling and installing software manually on HPC systems using Python. It’s a simple idea but not at all easy to implement.
Read more →