Binary cache - thoughts

srtcd424 · May 4, 2024, 5:42pm

Following on from my earlier post, I wanted to jot down a few notes and ideas about binary caching before I forget!

Roadmap

Long-term it seems to me some sort of p2p based system for this would be ideal, but before then I think a smaller / simpler system for use by a more trusted group of devs and tinkerers might be helpful and be more achievable?

(Ultimately there is a trust / security issue with a larger-scale p2p system where one needs to be sure that the downloaded binary is a correct build of the requested package. This would mean working on fully reproducible builds, and then perhaps some sort of database where multiple builders could report their hash of the build, and the client could chose to place trust once a configured quorum had been reached.)

Frontend - attic?

For the front-end so far I’ve looked at and experimented a bit with attic- it seems to be aimed at smaller groups, but that might OK for a while. It can also filter out objects that are present in a designated upstream cache, which might allow us to piggy-back on NixOS and only have to store packages where they have changed in Aux.

My mini server uses an SMR drive with access cached through an SSD with bcache, so while performance will not be great it shouldn’t be totally terrible either. Unfortunately with the default chunk sizes and sqlite backend, attic’s usage patterns seems to cause a huge amount of fsync calls which my system hates. A quick test with the postgres backend suggests that will be quite a lot better even on the same storage stack.

Backend / storage

For smaller-scale use by a group of devs I was thinking we could probably get away with a single front-end machine initially (though that would be a SPOF), but probably need distributed storage given how quickly nix caches tend to explode in size. I’m not sure how much anyone hosting a storage node would need to have to trust other nodes or any central controller? I guess a solution with simple API would be a benefit here.

Storj

Storj is all a bit blockchain-y, but it seems it is actually self-hostable, though there don’t seem to be huge amounts of docs around that. Given the likely complexity, I’ve not looked in much detail - this might be something for further down the line when we want to move to a wider access public cache?

Tahoe-LAFS

@corbin on fedi warned me off this based on significant past experience, but it seems to have the concept of low required trust between clients and storage nodes in the design. I was curious if it would be any use for a small-scale cache, even a personal one.

I have got a PoC up and running locally using a rather wobbly stack of attic → rclone serve s3 → sftp → tahoe-lafs. Unfortunately rclone’s s3 server dumps all the blobs in a single directory which Tahoe seems to hate, with a small upload eventually grinding to halt with Tahoe using 100% CPU.

It does have a simple REST API though so if there were good reasons for using it for anything it might be possible to write something to skip some layers in the above stack and store objects more sanely.

Edit: sadly the performance problems seem to be inherent; disabling attic’s chunking helps a bit but almost certainly not enough to make Tahoe useful, IMO.

Other options - distributed S3 stores

These might require a bit more trust between operators, but that might be OK for small scale cache sharing between devs etc. I’ve not looked at any of them in any detail yet.

Other options - ‘sharded’ generic S3 stores

Not sure how viable this is, but it looks like there might be one or two options to present a unified view of multiple generic S3 stores - obviously one would need to be able store data as well:

rclone union

This would allow using any S3 implementation whether it supports distributed operation or not, e.g.:

minio

Do feel free to mention other ideas and options in the comments, I can edit this post or it can be turned into a wiki or something. BTW, this is very much not something I am experienced in - I fell out of commercial IT well before the advent of cloud computing! So anyone with more experience is welcome to correct me on anything, or take over these ideas and run with them if they are interested. I will continue my own small-scale experiments as time/energy/health permits, and I’m happy to spin up small containers, VMs, etc here if it helps to test someone else’s proof-of-concept or whatever.

jakehamilton · May 4, 2024, 6:17pm

I will mention that I have gotten an offer for us to use FlakeHub Cache. I am not sure we want to decide on something like that just yet, but it will be worth considering as we move forward.

dfh · May 4, 2024, 6:52pm

Used to run a few storj nodes on their “official satellites” (their lingo for the coordinating nodes on their blockchain), and there’s a tricky piece to their design.

The storj protocol is proprietary and you need a custom client to use it. Their S3 compatibility is implemented as a Gateway it’s not native to the storj protocol.

If your goal is to implement a distributed cache you either have use their SDKs to make your tools speak storj natively or use the S3 Gateway which can become a single point (of failure. performance, access, …)

I was also wondering if approaching companies for sponsorship might be a way to go with this.

One possibility I see (technically speaking) is Cloudflare as they are already in the business of providing S3 cloud storage with 0 egress fess. They have an open-source sponsorship program one can apply to.
How much that aligns with Aux values and vision I have no idea.

srtcd424 · May 4, 2024, 7:31pm

That’s a little surprising, given it’s a DetSys project? It would presumably have the advantage of simplicity though. Was the offer just to support development, or potentially for an open public cache as well?

I did think one of the possible goals for Aux - especially if NixOS itself resolves its governance problems - might be to create a more general model for forks, derived distros, etc. Apart from Guix, NixOS is pretty much unique in Linux world in not having yet been forked, and I tend to see diversity as a good thing. Exploring decentralized storage might be a part of that … but the temptation of being able to get up and running quickly with a more conventional approach is definitely appealing too.

srtcd424 · May 4, 2024, 7:35pm

I wondered that too, but OTOH NixOS itself seems to have been struggling a bit with getting storage funded so I don’t know what our odds of success would be?

Some product/company I’ve looked at in the last few days either in the CI/CD or storage spheres appeared to use nix behind the scenes, which made me wonder if they would be worth approaching. Annoyingly though I can’t remember who/what, and I don’t appear to have bookmarked it

jakehamilton · May 4, 2024, 7:40pm

The offer is for a public cache like what cache.nixos.org is.

dfh · May 4, 2024, 7:47pm

I should have added to my previous post:
The distributed cache solution sounds and feels rly nice to me, but it looks like it requires engineering and testing effort. Hence might not fit the requirement of a fast solution.

Offers like sponsorship might be able to solve the more immediate need.

So maybe what sets the 2 possibilities apart is just the timline

isabel · May 4, 2024, 7:48pm

For those blissfully unaware of what work I’ve been doing, I have been working on reducing the need for us to create a binary cache at least for the time being. So this should mean that we have longer to discuss potential options. And thus I really want to weigh in but I really have no clue where to start.

isabel · May 4, 2024, 7:53pm

I am very happy to work with attic. I have done a lot of work with it but by no mean is it easy. Though this is where I would like to start. Whilst continuing to bootstrap off of cache.nixos.org. Though I have no idea how the backend would work.

This is awesome. But I thought we were trying to move away from DetSys.

EDIT: clarity

srtcd424 · May 4, 2024, 7:54pm

I grovelled through my browser history, it might have been: https://private.storage/

nat-418 · May 4, 2024, 9:51pm

Is the FlakeHub Cache closed-source / proprietary?

isabel · May 4, 2024, 9:57pm

I don’t believe so but it is owned by DetSys

nat-418 · May 4, 2024, 9:59pm

If we can confirm that the cache is not proprietary then I don’t mind using it, at least not until Aux is such a runaway success that we can easily host our own cache.

Putting my cards on the table: I think we should be friendly with other parts of the ecosystem, but not put up with any BS.

isabel · May 4, 2024, 10:06pm

I can confirm that the software is proprietary confirmed by graham.

But he did say that we could get a similar result with attic and some selfhosting.

getchoo · May 4, 2024, 10:35pm

i really think this is the obvious choice. i’m sure many here have their opinions on them and coporations in general, but i take this as a pretty clear sign of good will – after all, a binary cache at the scale of the nixpkgs is no small expense at ~$9000 USD/month for the foundation

i also don’t see a way we would be able to do this without any corporate sponsors or products. at the very least, basically any alternative would still probably end up giving money to faceless corporations for Simple Storage Solutions (if you get what i mean ). at least with detsys, we know the people behind it and seem to be personally supportive

getchoo · May 4, 2024, 10:37pm

this is also not to mention the trust issues with even a single binary cache. a distributed cache would only compound this and add a much larger attack service. i would highly recommend against this option until trustix (or similar) is production ready and can be mass adopted

isabel · May 4, 2024, 10:39pm

To be transparent I think its worth mentioning that unless @jakehamilton got explicitly told that they are going to have free access for the long run its not worth considering. See below image for context.

I think the simple fact about it is that there is no way without some major sponsor. Which is why I’ve been trying so hard to keep using nixpkgs cache.

getchoo · May 4, 2024, 10:43pm

why not? even as a temporary solution, this would do wonders in making actual changes to nixpkgs rather than continuing to be what’s honestly just the same thing, but slightly less up to date

it could also give us a huge jumpstart on fresh repos like core, javascript, etc

isabel · May 4, 2024, 10:51pm

My biggest issue would be vendor lock in. But i see where you are coming from.

marshmallow · May 4, 2024, 11:59pm

Nixos is highly vendor locked into AWS, I really, really, really, don’t like the idea of being locked into a DetSys product ~~besides nixcpp~~.