12 Startups in 12 Month

20-04-2025 — Saeid

A technical report on building 12 startups in 12 months (or less). I’ll regularly update this page with changelogs and the progress. Last Update: 22-04-2025

Working with Large Language Models

15-02-2025 — Saeid

Large language models are experiencing their Cambrian explosion. They may not be the path to AGI, but at least they give a taste of what it could be. The current mainstream approach to rely on scaling, may not be the sole solution, as the data runs out, and the models are plateauing in benchmarks, though we may very well see emergent behaviors that surprise us, as the bitter lesson taught us. In this article, I first review Transformers model, then summarize useful workflows with LLMs, which I intend to keep updating.

SIMD Programming and Vector Optimizations

03-02-2025 — Saeid

Essentially all modern processors have the capability to apply instructions on a vector in one processing unit cycle instead of operating on a single scalar. Language designers and compiler developers have been trying hard to leverage these hardware capabilities by compiling scalar programs into vector instructions. One possible approach is using SIMD (Single Instruction, Multiple Data) intrinsics, supported by all modern C/C++ compilers, through SSE (Streaming SIMD Extension), AVX (Advanced Vector Extensions) others for x86 architectures, and ARM NEON extensions.

Tiled Matrix Multiplication

25-01-2025 — Saeid

Tiled algorithms, in general, divide the problem into smaller, manageable tiles that fit into faster, but limited-size memory, be it cache, shared memory, or registers, to improve memory access patterns. Many problems which often involve matrices, can be broken down into tiles. This improves cache utilization which improves cache hit/miss ratio, by reusing data within smaller subproblems. In this post we implement tiled matrix multiplication.

World Foundation Models

19-01-2025 — Saeid

At CES 2025 event, NVIDIA announced, among other interesting things, including a single chip made out of72 Blackwell GPUs, with 1.5 ExaFLOPS performance, it’s first world foundation model, Cosmos. DeepMind is working on a similar model, and this is also one of xAI’s core missions. I suspect that these models are among the next frontier in scientific and AI research, following AlphaFold’s breakthroughs in protein folding prediction.

Older posts →