SIG Repos: How should they work?

Jeff · May 8, 2024, 10:47pm

First I’m so glad were all onboard with reducing recursive dependencies.

I think experimentation will be needed.

And towards that, I have automated nix code refactoring before (I created a nix bundler similar to a JS bundler). Let me say, there’s some real easy stuff we can do from the begining, even at the formatter level, that will allow us to make massive refactors overnight if we realize we need a completely different structure. Stuff like, always accessing packages through attributes (ex: auxpkg.name instead of using with auxpkg; name (which is also horribly slow at runtime)), being consistent with how top-level attibutes are set rather than sometimes dynamically generating top level attr names (which again is also slow), having a folder name match the top level attribute name, etc.

I can confirm this is not possible. Python’s R module has the entire R language (and R modules) as a dependency. Other languages have python as a dependency.

A Design Draft v1

I have spent 3 years trying to automatedly refactor nixpkgs. So basically I’ve been training for my whole (nix) life for this (not that I have everything figured out).

Here is a structure that I think could get put-to-work this week and later.

Note: in the list below dependencies go up
E.g. lib depends on nothing, core depends on lib, etc.

Repos:

lib - a pure nix lib, no system/stdenv (this already exists as a flake so we can use that)
core - a reorganized attrset of minimum-packages needed to build the nix cli (which I know how to create thanks to nixpkgs-skeleton)
sig_sources - manually edited/maintained repo. Who is this for? ecosystem/SIG maintainers
registry - automated. Who/what is this for? think of this like IR (intermediate representation). It exists to untangle the spaghetti of package dependencies/recursion, which helps cross-ecosystem maintainers, core maintainers, improves runtime performance, and makes package-indexing/security-notifications/testing and other stuff automatable
ecosystems - manually edited/designed. Who is it for? end users
aux - the aux namespace. Not backwards compatible with nixpkgs. Curated (starts off pretty empty). aux = { auxpkgs = registry; ecosystems = ecosystems; lib = lib; }
polypkgs - pollyfilled pkgs, aka nixpkgs with overlay of auxpkgs (e.g. mostly backwards compatible, temporary)

(I don’t care about the names, like polypkgs could definitely have a better name, thats not the point)

Design of each

Lib
- Should probably be split into stdlib (aka just lib, always forward-compatible) and internalSupport (aka stuff the aux-monorepo(s) might need but the rest of the world might not need, and also might break forward compatibility)
Core
- to create core, copy all nixpkgs files into a new repo, disable the cache, nix build -vvvvv --substituters '' '.#nix', do it on Linux x86, Linux Armv7.1, Mac Apple Silicon, and Mac x86
- if a file path is logged on at least one of those^ builds, then keep it. Delete all other files
- using the log output and some brute force trial-and-error we should be able to detect which of the top-level attributes that were evaluated
  - note which attributes existed on all systems, vs which were system-specific
- top-level.nix is going to contain a ton of junk still, with attributes to packages that don’t even have files anymore.
  - While we should eventually clean it, the 1-week fix is to not clean it
  - Make a minimal-legacy-core.nix that imports the top-level attr set
  - Make the attrset of minimal-legacy-core only inherit (from top-level) only attributes that are we know are used
- create core.nix which imports minimal-legacy-core, but re-structures it:
  - if an attribute is not a derivation, then put it directly on core
  - if an attribute is a derivation, and builds regardless of OS put it under core.pkgs
  - if an attribute is a derivation, but only exists on a certain OS, put it under core.os.${osName}
- v1 done
  - Later we can work on cleaning this repo.
  - Updates can be semi-automated by looking at the nixpkgs git history and checking for changes to the relevent subset of files
Registry (which will become auxpkgs)
- This is the key to what I call, THE GREAT UNTANGLING.
  - Which I think is the most important change to nixpkgs, and is the cause of inter-ecosystems trouble we are hitting right now.
- A flat repository
- The “this week” solution is to have registry start as an empty attrset, but I’ll continue to describe how it fits into the bigger picture.
- Two kinds of packages
  - base packages (normal nix func that produces 1 derivation as output, ex: python)
  - extension packages (these “modify” an existing derivation. For example, have numpy take python as an argument and return both a new python (that now has numpy), and also just a standalone numpy derivation (for stuff like numpy header files or venv))
  - Both extension-packages and base-packages are stored in the same flat attrset
- Note: we are forced to have both base and extensions in registry because some base packages (like VS Code) need base+extension packages (like nodePackages) to build themself. So its not possible to fully separate all base packages from all extensions packages.
- The great untangling/ordering
  - To fix the recursion issues we need the attrs in the top-level of registry to be in a particular order. This can be done, and scaled up without issue if we automate the generation of the top-level.nix file.
  - For an example of the attr order, if a package depends on merely core and/or lib then it is considered to have 0 dependencies. It goes at the top. However, something like npm would need to appear BELOW pythonMinimal because npm depends on pythonMinimal. You might be thinking “But Jeff, some packags–” I know, we’ll get there. Every built-package had a dependency tree (specifically a directed acyclic graph (DAG) of a tree). Conceptually, the order of attrs in the registry is the breath first search (BFS) iteration on the combined dependency tree of all packages. Conceptually. The main reason this post is so frickken long is because nixpkgs pretends the dependency tree has loops, even though, in reality, if packages are to ever be built in finite time, the dependency tree cannot have loops.
  - In practice we can achieve a total ordering of packages, with the following logic:
    - If [pkg] only uses core/lib, put in alphabetical order at the top
    - If all of pkg’s dependencies are already in the registry list; easy, just put the package as high as possible, while still being below all of its dependencies
    - Those two rules alone handle a massive amount of packages, but not everything. Let me introduce the “problem children”
    - 1. If a package has dynamic/optional dependencies we first try to assume that it uses all of them, even if that is somehow impossible (ex: for a package using gcc on linux and clang on mac, we pretend it uses both gcc and clang at the same time). If, with that assumption, all the pkgs dependencies are on the list, then we’re good. If not, then using tree search and some assumptions we can detect the issue and fallback on the next approach.
    - 1. We will need to semi-automatedly/manually break up some packages. There are kinda three cases for this. definitelyMinimal+maybeFull, branching groups, and multi-base-case recursive dynamic dependencies.
      - definitelyMinimal+maybeFull+: For dynamic non-recursive dependencies, such as pytorch maybe needing cuda, we can often break them up into a “minimal” package and a “full” package. The reason I say definitelyMinimal is that the minimal case cannot have any optinal arguments. It needs to be the bare-bones and nothing else. On the flip side, some packages like ffmpeg and opencv have tons of options and some options are incompatible. We can’t actually make a ffmpegFull. So instead we have an ffmpegMaybeFull where every option is available, and we ensure ffmpegMaybeFull is below all dependencies for all options. This minimal+full technique also works for trivial recursion. Every trivially recursive package has one base case (by definition). That base case gets put in its own derivation as minimal, then the recursive case becomes the full version.
      - Branching groups: Not all dynamic dependencies work under the minimal+full method. For example, evaluating a package on MacOS might cause it to have a different tree-order – an order that is incompatible with the same package evaluated on Linux. Theoretically this can happen even without OS differences. Solving this is actually pretty straightforward, the package is broken up into branches (different groups of dependencies) such as package_linux and package_macos. Each of those will have their own spot in the ordered list. Then one-below the lowest one (aka the one with the most dependencies), we create a combined package. The combined package depends on all the earlier ones, and the contains the “if … then package_linux else if … then package_macos” logic.
      - Dynamic recursive dependencies: Unfortunately I can confirm there are packages that are deeply, painfully, multi-base-case recursive with dynamic dependencies.
        
        Let’s start with easiest example. Let’s say registry.autoconf depends on perl. Well registry.perl (ex: perl 6.x) might depend on perl & autoconf. And now we’ve got a multi-recurisve problem; autoconf needs perl and perl needs autoconf (and perl!), its the dependency tree with loops.
        
        Except in reality reality we start with core.perl, then build autoconf::(built with core.perl), then build registry.perl::(built with core.perl and autoconf::(built with core.perl)), and then build autoconf::(built with registry.perl::(built with core.perl and autoconf::(built with core.perl))). It quicky becomes a lot to mentally process … and that’s the simple case!
        
        Nixpkgs does stuff exactly like that behind the scenes, at runtime. Thing is, we don’t have to do it at runtime. We can be way more clear about what is going on by adding stages.
        
        registry.autoconf_stage1, statically depends on core.perl.
        
        registry.perl_stage1, statically depends on registry.autoconf_stage1
        
        registry.autoconf_stage2 statically depends on registry.perl_stage1
        
        All other registry packages use registry.autoconf_stage2 instead of just “some version of autoconf”.
        
        While still complicated, making these stages explicit is, I think, the only way to make this stuff even barely manageable. Just imagine the difference between “Error: autoconf_stage2 failed to build” compared to “Error: autoconf (one of multiple generated at runtime) failed to build”.
        
        While this does require skilled manual labor, there’s not too many packages like this.
        
        Well … except for one category. Cross compliation.
        
        While I think we should have cross compilation in mind from the begining, I don’t think we should immediately (or any time soon) jump into trying to handle cross compiled packages.
        
        The normal (not-cross-compiled) version of a package is going to have less dependencies, and be higher up on the dependency tree. We should focus on those first since they’re the foundation.
        
        That said, I want to recognize what will eventually need to be done for the true deepest most nasty hairball of spaghetti-code in all of nixpkgs; cross compiling of major tools like VS Code, using QEMU virtualization. Not only is it an explosion of dependencies, its possible to depend on the same version of the same package twice, once for the host architecture and again for the target architecture. If we can eventually tackle that, I don’t think it gets any worse.
        
        I know it might feel unclean (give me a chance to talk about SIG sources), but in order to detangle cross compliation, some registry packages will need to have system postfix names like gcc_linux_x86_64, just FYI.
- Last note on the registry, we can use a _ prefix to indicate when a package attr is “just a helper” rather than a derivation that we want to be user-facing. For example _autoconf_stage1, _autoconf_stage2, and then we would have autoconf (e.g. stage2 renamed and ready for public use)
SIG sources
- While the registry can make detangling the recursion possible, it doesn’t necessarily make things perfectly easy to maintain. At a practical level, we can’t just have one package file for each registry package, because stuff like python (python2, pythonMinimal, CPython, Jython, Cython, pythonFull, etc) are going to have a bunch overlap in terms of nix-code, even if they belong at different levels of the dependency tree.
- SIG sources can let us have our untangled cake and eat (maintain) it too, but there is a big risk!
- Each SIG could have a directory inside of the sig_sources repo. For example, let’s say there’s a maintaince group for python. Every sig directory would be designed in a way that a script in the registry-repo could scan the SIG folder, see exported packages, see a static list of dependencies for each exported package, then compute the correct order for all of them, and have each attr import code from the sig directory.
- The danger is that we accidentally recreate the same nixpkgs mess. For example, a giant python/default.nix file that handles every variant of python, packed to the brim stuff like if isPythonFull then ... if isCython ... if isJython. In that case, we are right back to a recursive mess; because cython needs pythonMinimal, and both pythonMinimal and cython are generated by the same monolithic python/default.nix. We have only added indirection. The registry makes de-tangling possible, it doesn’t guarentee it.
- How can we solve this without subjective “code-feel” guidelines? Two rules.
  - 1. Evaluation at different points of the tree (e.g. pythonMinimal vs pythonFull) doesn’t always matter. For example, the aux lib functions wouldn’t care at all since they don’t use derivations. So when does it matter? Well lets say we had a helper like escapePythonString. If that helper is implemented without the registry, then its like lib, it doesn’t really care “where” in dependency tree its evaluated. However, if that same tool, escapePythonString, for some reason, needed registry.sed, then it becomes a risk of being tree-order dependent. Lets say we have another helper, buildWheel, which depends on pythonMinimal but is used inside of pythonFull. While not too common, when helpers depend on registry packages, we can break them up into groups. For example, utils_pre_python.nix could contain escapePythonString, and indicate at the top of the file that there is a dependency on registry.sed. Because buildWheel has different registry dependencies, we would need to make a different utils file, like utils_post_python_minimal.nix to house the buildWheel function. While this handles the tree-ordering issues, it doesn’t necessarily fully stop spaghetti code.
  - 1. This one is hard to explain, but once it “clicks” its easy to have an intuition for. Going back to escapePythonString, lets say it, and all of the helpers are pure-nix. We use escapePythonString across python2, python3Minimal, python38Full, etc. Everything is great. Then one day someone invents Wython (fictional) and the string-escaping of Wython is just a bit different than python. So we face a choice. Either
    - A. We create an independent escapeWythonString
    - or B. we make escapePythonString a bit more complicated by adding a { isWython ? false, ... } parameter
  - You might think “whatever, those options are merely personal preference” but that’s not entirely accurate. The runtime has slight performance difference in terms of tree-shaking, and we can the detect difference objectively via code coverage. Additionally there’s an argument to be made that option B creates a spaghetti control flow. Quick disclaimer, I’m not a 100% coverage kinda guy – I don’t care if a project has 50% coverage – code-coverage is just a tool.
    - Lets talk about tree-shaking, and look a option B. If we run python2, python3 or any individual build, the code coverage of escapePythonString will be more than 0 but not 100%. All of them miss the if isWython branch inside of escapePythonString. That means the engine is always wasting, at least a bit, of time evaulating code that will never be evaluated while building python3.
    - In contrast, under option A, building any individual package causes each helper function to either be 100% or 0% (e.g.100%=escapePythonString, 0%=escapeWythonString)
    - I’m not saying it needs to always be 100% or 0%, but rather:
      - If a single build calls both escapePythonString { option1 = true; }, and escapePythonString { option1 = false; } then there’s no issue, escapePythonString doesn’t need to be broken up (regardless of how other builds use it).
      - For example escapePythonString { singleQuote = true; }, and escapePythonString { singleQuote = false; }
      - But, if Wython only uses escapePythonString { option1 = true; } and all other builds ONLY use escapePythonString { option1 = false; } then there is a problem.
      - For example escapePythonString { isWython = true; }, and escapePythonString { isWython = false; }
  - For the “this week” implementation, these rule can just be eyeball-enforced.
    - It’ll be good enough to prevent the monolithic recursive dependency spaghetti problem.
    - With a tiny bit of practice it’s not that hard to follow the rules manually
    - If there is a debate it won’t become personal-preference war because there is an objective way of determining the answer
    - If a small case is missed, its not a big deal to find/fix it later
    - Later this can be automated by recording the code coverage of each registry-package in a SIG source. For all nix functions that were evaluated during the build, if the function was defined in a file within the SIG folder, and no individual build got 100% coverage of the function, then its flagged. If there is a different combination of arguments that cause a build to get 100% then it passes the flag, otherwise it needs to be broken up.
- There’s other technial details of SIG sources to discuss, like having inter-SIG dependencies go through the registry instead of being direct imports, and having all SIG sources provide one file per registry-entry, and each registry dependency be a function argument rather than an import, but I’m trying to not turn this post into my dissertation (despite how it might look)
Ecosystems
- Goal: be as ergonomic as possible for users
- Ecosystems shouldn’t depend on other ecosystems directly: either import derivations from the registry, or import nix-functions from a sig source
- SIG sources != ecosystem
  - For example, JavaScript might be a SIG group (someone who knows JS has relevent skill for maintaining both bun and nodejs), but in contrast nodejs might be an ecosystem, and bun might be a different ecosystem.
  - SIG sources might need to have messy tooling for bootstrapping like pythonMinimal_stage1. The ecosystem interface should hide all that and just present the final product.
  - If it helps generate packages in the registry, or if a registry needs a tool → then it goes in a SIG source
  - Else → Ecosystem
  - Home manager probably would live in the ecosystem space
- Enable stuff like the dev-shell mixin experience (ex: ecosystems.aux.tools.mkDev [ ecosystems.nodejs.envTooling.typescript ecosystems.python.envTooling.poetry ecosystems.latex.envTooling.basics ])
- While registry needs to be rigourously consistent in order to be automated, ecosystems only need to be consistent to help with ergonomics.
  - Like a common interface of
    - ecosystems.${name}.tools for nix functions
    - ecosystems.${name}.variant for minimal/full builds (ex: mruby, jruby, or jython or pythonMinimal)
    - ecosystems.${name}.main for the base tool (e.g. rustc/ruby/python/node)
    - ecosystems.${name}.pkgs. They can deviate on a per-ecosystem basis as needed.
    - ecosystems.${name}."v${version}".main
    - ecosystems.${name}."v${version}".pkgs
    - ecosystems.${name}.envTooling
    - etc
  - But they are allowed to be different when it makes sense, like ecosystems.${name}.tools.mkNodePackage, or ecosystems.${name}.tools.pythonWith
Aux
- Having one layer before getting into packages is important for future expansion, for example aux.pureAuxPkgs or aux.distributedPkgs, etc
Polypkgs
- Its own repo so that tarball-urls are easy drop-in replacements for nixpkg tarball urls
- If nixpkgs gets a commit, we generate a new flake.lock
- We have git tags equivlent to nixpkgs git tags
- Temporary
- Big special note: I know this goes against what I said at the top (“dependences only go up”), but out of practicality, and because this repo is temporary, sig_sources can use/refer to polypkgs.
  - Yes, this is a recursion issue (sig_sources uses polypkgs, which gets overlayed by auxpkgs, which links back to sig_soruces) but it is necessary. For example, lets say python is NOT in the aux registry yet. Lets also say nixpkgs.openssl is broken from a gcc update.
    - Cowsay can’t use nixpkgs.python (built with nixpkgs.openssl) because nixpkgs.openssl is broken
    - But cowsay can use polypkgs.python (built with the polypkgs.openssl which works because polypkgs is overlayed with registry.openssl)
    - E.g. cowsay doesn’t directly depend on registry.openssl.
    - The registry ordering script pretends cowsay has no dependencies (polypkgs is “invisible”)
    - BUT, as soon as we have a registry.python, (which would end up as polypkgs.python) we need to “collaspe” the difference, mark cowsay as depending on registry.python (instead depending on nothing), in which case the registry generator will put cowsay below python instead of having it at the top level.

SIG Repos: How should they work?

A Design Draft v1

Design of each

Lib

Core

Registry (which will become auxpkgs)

SIG sources

Ecosystems

Aux

Polypkgs