Binary cache - thoughts

OtherBookmarks · May 6, 2024, 5:10pm

There was a discussion about a P2P binary cache for nix. From what I’m understanding there are two components when searching for a binary package

A request to centralised server to get the hash of the package using the .drv derivation hash
Content-addressed request to the p2p service either in-process or using an external binary

In order to have a P2P binary cache one would thus need

build server(s) to
- build derivations
- hash their output
substituter to store mapping between derivation input and derivation output
optionally storage to store the derivation output
a nix binary that know how to communicate with a P2P binary cache

A possible collab with Lix was brought up and with their help, one could possibly get this into lix faster than anybody ever could get it merged into nix.

Since it’s P2P, I think there are two options for the build servers:

centralised build servers aka one source of trust →
- first of public derivation output is served by central storage
- copies of derivation output are then served by replicators (people willing to serve the binaries)
centralised build orchestrator →
- derivations are built by trusted, participating nodes
- drv output hash is sent back to orchestrator and stored in substituter with some consensus algorithm (basically reproducible builds)
- participating nodes can also act as initial storage nodes or delegate to other storage nodes
- clients can immediately pull from distributed storage

The centralised build servers are probably the most expensive, but “easiest” to implement. The centralised build orchestrator might be new territory, but could be the highest possible reduction of code and allow relying 10s, 100s, or even 1000s of community build nodes + storage nodes. This is the rosy future I dream of - no more direct AWS, GCP, or Azure: JaBOCN - Just a Bunch Of Community Nodes.