Spack Package Manager
Recently I gave a talk/mini workshop on spack package manager
at SKAO (available here), for which I dug into the core of spack.
Spack boasts itself as the package manager for HPC, and it really is. The idea is straightforward,
streamline what we normally do when compiling and installing software manually on HPC systems using Python.
It’s a simple idea but not at all easy to implement. Compiling software is not an easy task,
especially the scientific software and the ones that are designed to run on HPC systems on a large scale.
Oftentimes, there are a lot of subtleties, and how it’s compiled can significantly affect the performance.
At the core of spack, there is the concretizer which builds the DAG (Directed Acyclic Graph) for the dependencies.
The way it achieves this is interesting, package recipes are written in Python, and they subclass a specific build system, for example, CMake.
The package dependencies with their constraints are written with depends_on
directives,
and they can be conditioned with a when
argument, for example,
a package may depend on the Intel Math Kernel Libraries (MKL) as its BLAS backend when specifically
asked by the user with e.g. +mkl
specifier in the command line, this will be then written as:
depends_on("intel-onepi-mkl@2023.0.0:", when="+mkl")
Spack uses Clingo to do the concretization. Clingo is the most widely used ASP (Answer Set Programming) solver developed by the University of Potsdam that, uses SAT (Boolean Satisfiability) solver. ASP is a declarative programming that is used for searching and optimization problems. It’s particularly good at handling knowledge representation, constraint satisfaction, and problems with multiple solutions (answer set). Clingo first uses gringo to ground the ASP program, which converts variables into propositional logic, then converts this into SAT clauses, and then uses the SAT techniques to determine if there’s an assignment of boolean variables that makes the propositional logic formula true. Once the concretizer has determined the concrete specifications, it builds the DAG (Directed Acyclic Graph), where nodes represent packages and edges represent dependencies. Each node contains enough information about exactly what needs to be built, and the structure of the graph ensures the proper build order. The stages where the solver spends most of its time, and what is the optimization priority criteria, can be viewed:
# assume a fresh environment, don't reuse
$ spack solve --fresh --timers fftw
setup 3.160s
load 0.214s
ground 1.252s
solve 1.173s
construct_specs 0.524s
total 6.488s
==> Best of 4 considered solutions.
==> Optimization Criteria:
Priority Criterion Installed ToBuild
1 requirement weight - 0
2 number of packages to build (vs. reuse) - 0
3 number of nodes from the same package - 0
4 deprecated versions used 0 0
5 version badness (roots) 0 0
6 number of non-default variants (roots) 0 0
7 preferred providers for roots 0 0
8 default values of variants not being used (roots) 0 0
9 number of non-default variants (non-roots) 0 0
10 preferred providers (non-roots) 0 0
11 compiler mismatches that are not required 0 0
12 compiler mismatches that are required 0 0
13 non-preferred OS's 0 0
14 version badness (non roots) 0 1
15 default values of variants not being used (non-roots) 0 1
16 non-preferred compilers 0 0
17 target mismatches 0 0
18 non-preferred targets 0 0
19 compiler mismatches (runtimes) 0 0
20 version badness (runtimes) 0 0
21 non-preferred targets (runtimes) 0 0
22 edge wiring 1 0
- fftw@3.3.10%apple-clang@16.0.0+mpi~openmp~pfft_patches+shared build_system=autotools patches=872cff9 precision=double,float arch=darwin-sequoia-m1
- ...
Interestingly, the generated ASP can also be viewed with the --show=asp
flag (it’s long, some 30000 or so facts!).
I think spack solves a hard problem, which has existed since the dawn of computers, in an interesting brute-force way.
PS. Some examples of spack packages are available in the jupyterlab instance.