Cleaning Up Core

For context, this is the current dependency tree of nix-cli for darwin (every node is a derivation):

There is so much stuff – from imagemagick to cario (2D graphics) – that should have absolutely nothing to do with the nix-cli. And there is even more stuff (ncurses, lsof, nixfmt, cargo, many python3 packages, many haskell packages) that, very likely, are not needed. There are – at least – 7 distinct LLVM derivations, and ~30 bash5.2 derivations (all bash 5.2 but each is slightly different somehow).

I think understanding, reducing, and cleaning core is going to be necessary before being able to go back and think about how to structure repos/code for the SIG’s

.

I already was able to get rid of some heavyweights like numpy just by manually disabling optional documentation in some nested packages. There are 754 default.nix files, but there are well over 1000 nodes, meaning there is lots of recursion. I’m going to be trying to reduce the total number of packages, files, commented-out blocks, and recursive builds while still getting nix-cli to build and pass all tests.

13 Likes

I’m realizing that I think nixpkgs kind of defaults to maximizing dependencies because of how fixed point recursion works. E.g. maybe stage2 llvm would be good enough for compiling nix-cli but instead nixpkgs goes for stage4 llvm first before trying to compile the nix-cli.

Its looking more and more to me like we are going to need to rewrite core from scratch :confused: starting at the bottom of that dependency tree image and going up.

I might try working on that. And since the “bottom” changes depending on the OS, I’ll have a folder for each OS triple and a “shared” folder. The shared folder will have one file for derivation/graph-node but no order. Each OS triple will somehow link the shared folder items into a total ordering.

6 Likes

Not surprised by this at all - I had a feeling Nixpkgs’ stdenv wasn’t going to be suitable for us for very long. Good to get something going, but there’s evidently a lot of baggage in there that’s difficult to untangle.

I think CppNix/Lix is a good “first target” for Core. I feel like setting up Core to be able to build that, and nothing else, is a good first step. What do we all think?

7 Likes

FWIW, this is what I’d settled on as a next step for my binary cache / CI / hydra experiments. It seems like the obvious next step ‘bootstrap’ target beyond stdenv.

I came up with a format and got bootTools working, which is one of the fundamental derivations.

I’m trying out a format like this:

  • nodes/
    • 0_lib/
      • (submodule to lib)
      • default.nix
      • static/
        • setup.json # stuff like maintainers, licenses, and a list of OS triples. Things other tools might want to parse/use
        • meta.json # empty obj
    • 1_func_macBootTools
      • default.nix
      • static/
        • setup.json # has fetch URLs and sha hashes, making them easy to auto-update without generating nix code
        • meta.json # says what script command generates the setup.json
    • 2_drv_bootTools_darwin-aarch-64
      • default.nix
      • static/
        • setup.json # drv name
        • meta.json

The idea being each node is a func (nix helper), lib (nix helper but not a function) or a drv.

Importantly, to be performant and ensure there are no looping dependencies, the default.nix for drv is not typical. Its not a function that takes lib and pkgs, its not a func at all. Instead its just an expression that directly imports all dependencies at the top. For example:

let # for static dependency analysis this "let" intentionally only contains direct imports
  lib = import ../0_lib;
  macBootTools = import ../1_func_macBootTools;
in
  # real code starts here
  let
    static = lib.loadStatic ./.;
  in
    derivation {
        name = static.setup.name;
        # ...
    }

Right now, using nix-visulize and nixpkgs we have to build all the derivations before we can visualize the dependency tree. And the tree is system/OS specific. But this^ new format would allow us to statically analyze the dependency tree, including dependence on nix-code not just dependence on derivations. We can even generate a dependencies.json for each node to make it really easy for other tools to parse.

The format might run into problems later, but its what I’m trying for now.

5 Likes

However I am running into a pretty hard roadblock. After bootTools, the next derivation in the tree is kind of the stage0 derivation. Problem is, stage 0 is dynamically generated. There isn’t one clear derivation call, its instead part of a recursive stageFun.

Because printing and serializing stuff is difficult, its hard for me to pull out the arguments that are given to the stage0 derivation.

If anyone has ideas, like somehow replacing the the derivation call with a func that writes all its arguments to a json file, and logs the name of that json file before calling builtins.derivation, that would be a huge help for detangling these staging steps.

Edit: to future me, it might be good to try using nix build with --debugger and then intentionally edit the code to throw if the derivation name matches the one mentioned in nix-visulize. Then the repl should allow inspection of all the arguments that were given to the derivation, ans unsafeGetPositon might be able to reveal where some arguments came from.

1 Like

The following patch prints the args in json format with the trace function:

From c0285dc74f63d541aad386523c2603a8e2ef6710 Mon Sep 17 00:00:00 2001
From: Florian Warzecha <liketechnik@disroot.org>
Date: Wed, 22 May 2024 16:57:43 +0200
Subject: [PATCH] trace stdenv args as json

Signed-off-by: Florian Warzecha <liketechnik@disroot.org>
---
 pkgs/stdenv/generic/default.nix | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/pkgs/stdenv/generic/default.nix b/pkgs/stdenv/generic/default.nix
index 2cda43d5632f..dce783b85ee2 100644
--- a/pkgs/stdenv/generic/default.nix
+++ b/pkgs/stdenv/generic/default.nix
@@ -77,9 +77,11 @@ let

   stdenv = (stdenv-overridable argsStdenv);

+  printArgs = { ... }@args: lib.warn "args: ${builtins.toJSON args}" (derivation args);
+
   # The stdenv that we are producing.
   in
-    derivation (
+    printArgs (
     lib.optionalAttrs (allowedRequisites != null) {
       allowedRequisites = allowedRequisites
         ++ defaultNativeBuildInputs ++ defaultBuildInputs;
--
2.44.0

yielding something like this (I’ve modified the actual output to pretty-print the JSON with jq .):

trace: warning: args: {
  "args": [
    "-e",
    "/nix/store/ckzrg0f0bdyx8rf703nc61r3hz5yys9q-builder.sh"
  ],
  "builder": "/nix/store/4w85zw8hd3j2y89fm1j40wgh4kpjgxy7-bootstrap-tools/bin/bash",
  "defaultBuildInputs": [],
  "defaultNativeBuildInputs": [
    "/nix/store/h9lc1dpi14z7is86ffhl3ld569138595-audit-tmpdir.sh",
    "/nix/store/m54bmrhj6fqz8nds5zcj97w9s9bckc9v-compress-man-pages.sh",
    "/nix/store/wgrbkkaldkrlrni33ccvm3b6vbxzb656-make-symlinks-relative.sh",
    "/nix/store/5yzw0vhkyszf2d179m0qfkgxmp5wjjx4-move-docs.sh",
    "/nix/store/fyaryjvghbkpfnsyw97hb3lyb37s1pd6-move-lib64.sh",
    "/nix/store/kd4xwxjpjxi71jkm6ka0np72if9rm3y0-move-sbin.sh",
    "/nix/store/pag6l61paj1dc9sv15l7bm5c17xn5kyk-move-systemd-user-units.sh",
    "/nix/store/jivxp510zxakaaic7qkrb7v1dd2rdbw9-multiple-outputs.sh",
    "/nix/store/ilaf1w22bxi6jsi45alhmvvdgy4ly3zs-patch-shebangs.sh",
    "/nix/store/cickvswrvann041nqxb0rxilc46svw1n-prune-libtool-files.sh",
    "/nix/store/xyff06pkhki3qy1ls77w10s0v79c9il0-reproducible-builds.sh",
    "/nix/store/ngg1cv31c8c7bcm2n8ww4g06nq7s4zhm-set-source-date-epoch-to-latest.sh",
    "/nix/store/wmknncrif06fqxa16hpdldhixk95nds0-strip.sh"
  ],
  "disallowedRequisites": [],
  "initialPath": [
    "/nix/store/4w85zw8hd3j2y89fm1j40wgh4kpjgxy7-bootstrap-tools"
  ],
  "name": "bootstrap-stage0-stdenv-linux",
  "preHook": "# Dont patch #!/interpreter because it leads to retained\n# dependencies on the bootstrapTools in the final stdenv.\ndontPatchShebangs=1\nexport NIX_ENFORCE_PURITY=\"${NIX_ENFORCE_PURITY-1}\"\nexport NIX_ENFORCE_NO_NATIVE=\"${NIX_ENFORCE_NO_NATIVE-1}\"\n\n",
  "setup": "/nix/store/x1h4ccwki2c9jqlnjw507bz66hxbyaq8-setup.sh",
  "shell": "/nix/store/4w85zw8hd3j2y89fm1j40wgh4kpjgxy7-bootstrap-tools/bin/bash",
  "system": "x86_64-linux"
}

(I’ve omitted the rest of the output, but it does print this for all stages, up to the final stdenv)

3 Likes

Can you throw this in a branch so I can take a look at what’s happening here? I think i’m following the structure, but the code would help me wrap my head around it a little more.

2 Likes

Yep! I’ll push my WIP branch once I’m back at my PC

1 Like

Excellent work, Jeff. I think this is probably an excellent example of technical debt, and I’m guessing it happened because this seemed like the right way to do it, at the time. The question is, is it possible to improve on it, at the moment? Depending on resources, it can’t hurt to look at the options.

Thanks,

Chris.

1 Like

Can you share the command / (settings if you are using nix-visualize) you used to retrieve the graph?

First I ran these:

nix -vvvvv build '.#nix' &> log_nix.txt
nix -vvvvv build '.#auxPackages.aarch64-darwin.stdenv' > log_stdenv.txt

Then, looking at that those logs, ran these

mkdir -p result-visuals

# note some of these took a full 1.5min to run
nix-visualize '/nix/store/lkdnccvvwj3k0svsi292l8svnn3z6l8a-nix-2.18.2.drv'                            -o result-visuals/nix_cli.png        -c config.cfg  -s dbs
nix-visualize '/nix/store/5hwbb88if5krsd4frmv8mgbg5svgdadp-stdenv-darwin.drv'                         -o result-visuals/stdenv.png         -c config.cfg  -s dbs
nix-visualize '/nix/store/xl02cmm64r76z5yb343g0hgv48bqnc51-llvm-binutils-16.0.6.drv'                  -o result-visuals/llvm_binutils.png  -c config.cfg  -s dbs
nix-visualize '/nix/store/gg6grcxf525wf7x8k420rabd11l27mij-bootstrap-stage0-llvm.drv'                 -o result-visuals/stage0_llvm.png    -c config.cfg  -s nix
nix-visualize '/nix/store/2asw60w0g8ylpl5h9i282lcb0hdxymha-bootstrap-stage1-clang-wrapper-boot.drv'   -o result-visuals/stage1_clang.png   -c config.cfg  -s dbs
nix-visualize '/nix/store/gf7ci12ps92d4fylxr1f2mlz9d1rbfvm-bootstrap-stage2-CF-stdenv-darwin.drv'     -o result-visuals/stage2_stdenv.png  -c config.cfg  -s dbs
nix-visualize '/nix/store/0bd3xy62p7wikkz2f93ivx8sikplcxq3-bootstrap-stage4-clang-wrapper-16.0.6.drv' -o result-visuals/stage4_clang.png   -c config.cfg  -s dbs

This is the config I was using, there’s definitely a better config as the nodes in the middle are unreadable:

config.cfg

#------------------------------------------------------------------------------
# Example configurations for nix-visualize
#------------------------------------------------------------------------------
# n.b. All parameters are defined in the README

[nix]
# The settings used to generate the Nix dependency tree in the README
aspect_ratio: 1
font_scale: 0.6
font_color: #000000
img_y_height_inches: 6
dpi: 300
color_map: autumn
min_node_size: 75
max_node_size_over_min_node_size: 5.0
add_size_per_out_link: 50

[dbs]
# The settings used to generate the SQLAlchemy and knex dependency tree
# in the README
aspect_ratio: 2
font_scale: 0.2
img_y_height_inches: 20
dpi: 300
color_map: Accent
color_scatter: 0.0
top_level_spacing: 180
min_node_size: 30
max_node_size_over_min_node_size: 3.0
add_size_per_out_link: 20
max_displacement: 25.5
num_iterations: 2500
tmax: 800.0
y_sublevels: 6
y_sublevel_spacing: 0.12
repulsive_force_normalization: 8.0

[git]
# The settings used to generate the Git dependency tree in the README
color_map: summer_r
aspect_ratio: 3
min_node_size: 1000
max_node_size_over_min_node_size: 3.0
add_size_per_out_link: 50
tmax: 600
edge_width_scale: 3.0
font_scale: 2.0
y_sublevel_spacing: 0
n_iterations: 20000
attractive_force_normalization: 3.0
repulsive_force_normaliztion: 3.0
3 Likes

Alright I put it under the contributors/jeff branch.

A few things to note:

  • no formatting yet (for me personally formatting gets in the way of my understanding so I usually just format right before a PR)
  • static/setup is like a config of all constant inputs to a derivation
  • static/meta is summary/stats AFTER a derivation has been built
    (ex: whats in $out, how long did it take to build, when was it last tested, etc)
  • To test the branch I’ve be using two things nodes/1_lib/scripts/test script and just a nix repl with:
    > bootTools = import ./nodes/3_drv_bootTools_darwin-aarch-64
    > bootTools.PATH
    > lib = import ./nodes/1_lib
    > lib.foldr
    > lib.teams  # etc
    
  • I started detangling lib since its got a lot of unnecessary recursion, and is a great small example of what we have to do for nixpkgs overall.
  • We’ve got a backwards maintainance problem that we should probably open up a new issue about
    • For example, I added back the lib.maintainers from nix, because without them existing stuff like toJSON python.meta is going to break
    • There’s a lot of deprecated stuff in lib
    • There’s a lot of dumb designs like lib.platforms is very different from lib.systems.platforms
    • I think we are going to need to have a legacyLib for backwards compatibility that contains stuff like lib.maintainers, but then also a stdlib that is a clean slate for aux moving forwards.
  • I switched the static stuff from JSON to TOML to allow for comments. I prefer JSON, but we can auto generate JSON from the toml as a CI step so I figured we should start with TOML.
3 Likes

It decided to compile LLVM :smiling_face_with_tear:
Hopefully i’ll get my cores back before tomoroow

3 Likes

Here the results i got for stdenv-linux & nix


2 Likes

Dang… Thats so fewer dependencies. That’s great.

Whoa, that’s a huge improvement! Well done!

Edit: Reading comprehension fail on my part :laughing:

To clarify, the tree I posted was MacOS and Sigma posted Linux. Idk if Sigma reduced the dependencies or if that was the default.

1 Like

it’s the default, i didnt touched anything :sweat_smile:

1 Like

As an update, I’m still working on this! Its slow going. I’ll probably only work on it once every few months, but just wanted to be clear this thread isn’t dead even if there’s lots of silence.

5 Likes