Large language models are experiencing their Cambrian explosion. They may not be the path to AGI, but at least they give a taste of what it could be. The current mainstream approach to rely on scaling, may not be the sole solution, as the data runs out, and the models are plateauing in benchmarks, though we may very well see emergent behaviors that surprise us, as the bitter lesson taught us. In this article, I first review Transformers model, then summarize useful workflows with LLMs, which I intend to keep updating.
Essentially all modern processors have the capability to apply instructions on a vector in one processing unit cycle instead of operating on a single scalar. Language designers and compiler developers have been trying hard to leverage these hardware capabilities by compiling scalar programs into vector instructions. One possible approach is using SIMD (Single Instruction, Multiple Data) intrinsics, supported by all modern C/C++ compilers, through SSE (Streaming SIMD Extension), AVX (Advanced Vector Extensions) others for x86 architectures, and ARM NEON extensions.
Tiled algorithms, in general, divide the problem into smaller, manageable tiles that fit into faster, but limited-size memory, be it cache, shared memory, or registers, to improve memory access patterns. Many problems which often involve matrices, can be broken down into tiles. This improves cache utilization which improves cache hit/miss ratio, by reusing data within smaller subproblems. In this post we implement tiled matrix multiplication.
At CES 2025 event, NVIDIA announced, among other interesting things, including a single chip made out of72 Blackwell GPUs, with 1.5 ExaFLOPS performance, it’s first world foundation model, Cosmos. DeepMind is working on a similar model, and this is also one of xAI’s core missions. I suspect that these models are among the next frontier in scientific and AI research, following AlphaFold’s breakthroughs in protein folding prediction.
Recently I gave a talk/mini workshop on spack package manager at SKAO, for which I dug into the core of spack. Spack boasts itself as the package manager for HPC, and it really is. The idea is straightforward, streamline what we normally do when compiling and installing software manually on HPC systems using Python. It’s a simple idea but not at all easy to implement.