Call for feedback - Aux Secrets

@coded @minion and me (@dfh) have been brainstorming over different ways to do secret management in Aux given the rather bad state in which the nix secrets management stuff currently is.

The problems we are trying to avoid with the new solution:

  • Secrets committed to git
  • Secrets can’t change without redeployment/ being part of system configuration
  • The need for an unencrypted secret on disc, typically SSH key
  • Secret versioning, rotation, attestation and auditing are quite hard

A big aspect for the project is to use declarative methods for secret generation where possible. The gokey utility is an inspiration for this idea, but does not resolve the issues for secrets that are pre-shared in nature (think API tokens, WiFi passwords, etc).

We wanna hear your thoughts and how you would like the solution to look and feel.

If you’re into this topic, please help us out. You can join The Matrix Room or DM us on the discourse

8 Likes

Some thoughts/questions about this:

  • Secrets committed to git

Encrypted and plaintext or just plaintext?

  • Secrets can’t change without redeployment/being part of system configuration

This makes it no longer reproducible, as the secrets are versioned separately. I don’t think it’s a problem for e.g. API tokens as they are inherently not reproducible (you can’t rollback to an old configuration with a revoked API token), but if you mess up your password you might want to rollback to a previous config.

  • The need for an unencrypted secret on disc, typically SSH key

This is less of a problem in personal deployments (e.g. PC or home server, where physical access is not much of a threat) and it is much simpler, so I’d like to still have this as an option for simpler/lower security stuff.

  • Secret versioning, rotation, attestation and auditing are quite hard

Agree.

My conclusion is that you recommend versioning them separately which makes sense (you’d want to use newer passwords/API tokens where possible) and this would allow us to address most other problems (plain text secret on disk, secrets on git, etc.), but it might interfere with rollbacks.

I think agenix/sops is fine for home deployments and easier to manage, but for more professional deployments, having something like a TPM backed password store would be nice.
What appears to fulfill your requirements would be something like pass that would use the system’s TPM instead (systemd-creds might work?), then deployment would go like:

  1. decrypt secrets on build host
  2. encrypt them with target’s ssh/age pubkey
  3. transfer them to target
  4. target re-encrypts them with the TPM key and stores them in a new store.
  5. on test/switch, the TPM decrypts the secrets to /run/secrets and the services can access them (or through systemd-creds).

The secret store on the build host would be managed/versioned separately, but it should support push to a remote host through e.g. ssh so that the secrets can be updated on the fly. It might need to run a script that updates the secrets/reloads the services running to apply the updates though.


The above text was shared by me last night in the matrix server, I modified it slightly before posting it here to fix some mistakes and better represent my opinion.

2 Likes

After further discussions I believe we share a similar opinion and are leaning towards a specific design:

We would like to build a “vault” API where the secrets are stored in order to be shared among many machines/versions of machines.
Each “vault” would provide its own script that allows retrieving secrets from the “vault” itself (We lean towards defaulting to a gokey wrapper with some extra functionality).
The secrets would be retrieved and stored on a local store (backed by, e.g. systemd-creds), the store would optionally (but not by default) use the system’s TPM device, and a local secret (e.g. the system’s host ssh key).
The local store would then provide the secrets to the system services.

Example workflows

Initial deployment

  1. The vault is copied over alongside the system configuration
  2. The vault is used to provision the local store
  3. The local store is used to provide the secrets to the services
  4. The system is all setup

Configuration changes not affecting secrets

e.g. disable existing service that does not rely on secrets

  1. Redeploy the system
  2. Nothing to do, neither the local store, nor the vault changed

Configuration changes affecting secrets

e.g. enabling/disabling TPM support in the local store

  1. Copy over new configuration
  2. Re-provision the local store from the vault
  3. Restart affected services & switch to configuration

Vault changes

e.g. API key rollover, password change, etc.

  1. Copy over the new vault
  2. Re-provision local store
  3. Restart affected services

Scripts/extensions

Work that needs to be done to support this solution.

Extend gokey to store fixed secrets

Similarly to pass, create a store folder that contains secrets that cannot be generated based on the gokey seed (e.g. API keys). Keeping it simple, we’ll use the filename to derive a symmetric encryption key and retrieve the secret inside the file.

This will mean the folder will need to be copied over along with the gokey seed file, but it can be versioned separately from the config (e.g. through git). Or along with the config if your setup doesn’t mind that.

Extend nixos-rebuild to copy over the vault/provision the local store

Each vault should define a copy script, this script should either directly copy the vault to the target system, or it should provide a list of pairs of source and destination paths that nixos-rebuild should copy.

Extend the test/switch script to provision the local store

When running nixos-rebuild test/switch/install the local store needs to be provisioned based on the vault’s interface.

Extend modules to accept secrets from the local store

Finally, integrate the local store with the NixOS module system.

7 Likes

y’all mighta well known this, but reading this i remembered that hashicorp did a library literally named Vault. since their license change it was forked, but in other words, for what’s it’s worth there is stuff out there that interfaces with various existing services for secrets.

to what extent that could be useful to interact with in this context i’m not sure. i think nix restricted network access at certain stages to reduce impurity, tho iirc Vault did in fact work in terms of unlock → use → relock. so maybe bridging with the likes of that at least could help offload logic on interacting with other systems, for in as far as that might become desirable here.

1 Like

We’ve talked about vault but the pain point is it’s extremely involved to set up with. And as I’ve never used it: according to @jakehamilton

vault’s systems for creating secrets, setting policies, and managing engines are far too tedious

1 Like

I really like the idea of extensibility through an api/scripts, storing the secrets on the machine plain will likely be compatible with most consumption patterns.

1 Like

I’m having success using sops and sops-nix for secret management on NixOS and nix-darwin (with home-manager).

It lacks (or perhaps, just my usage of it, lacks) systemd-creds (and thus TPM support) at the moment but is otherwise very robust.

Likely everyone involved has already audited this option, if so ignore my post; otherwise if you’re curious about the full workflow reply and I’ll go into more detail :slight_smile:

3 Likes

The biggest issue with sops-nix/agenix is that they tie your secrets configuration to your system configuration, this means that if you rollback the configuration, you also rollback the secrets reverting any changes to API keys, passwords, etc.

Our belief is that passwords and secrets should be stored separately from the system configuration.

5 Likes

Very good point. Thank you, I hadn’t picked up that nuance.

1 Like

@jalil Apologies, I’ve been owing you a response on this for quite a while. Thanks a ton for the extensive write-up - it gave me a chance to think some things through and gain more clarity.

Personally I’m against both. gits purpose is to document for eternity the different versions of a file. Encryption algorithms age, new attacks become available that weaken the security of the stored values.
Some people use a separate access controlled secrets repo which makes the pattern IMO acceptable.

But given we reached the conclusion that from an operational perspective secrets are state that needs to be maintained independently from config, commit and versioning the secrets with the config might only be the best solution in specific cases but not in the general case.

This issue exists in both directions. A rollback might also break your credentials.
From what I see it comes back to what I said above: Is your credential versioning attached to your config versioning or not.

Agreed. This is where the scope of our previous description was missing. This solution is meant to co-exist with existing ones, not replacing them. IMO we’re in the current predicament because of a lack of flexibility how we manage passwords. If you like what exists right now and it works well enough for you, I would highly recommend sticking with it :wink:

The declarative+generative approach is highly beneficial the larger the amount of systems is one has to manage. In my recent thought I realized I might want to use the term “operational security” aka the management of the secret lifecycle over time and space (aka number of devices).

Jein. It’s not versioning them in the traditional sense. As the declared secrets are (simplified) KEY_DERIVATION($SEED, $STRING) with $STRING also called a “realm”, one can do cheeky things like including a timestamp (e.g. 2024-07 for a monthly rotated secret) into the string. Explicit versioning/ archiving them is not necessary anymore as we can now “time travel” through the secrets history by adjust that date string.

I’m not sure about the ease part. What makes secret management hard? It’s typically a human perception. Maybe for some it’s the amount of secrets stored (in which case this project even makes sense with a small amount of machines) maybe others struggle with rotating the secrets.

maybe it would be best to have a set of answers what people are struggling with or find annoying when using tools like sops-nix/ agenix. :thinking:

Way too complex for my test :wink:

One aspect I recently realized and tried to express in the Matrix channel is that the currently only pattern to managing secrets is the pattern of locking them in some form of vault (encrypted git, hashicorp vault, systemd-cred) and than trying to solve sync and decryption problem.

The declarative+generative approach does the same tradeoff that vaults do: Securing a long list of secrets with a selected list of credentials.
But instead of credentials to access secrets it’s seeds to generate new secrets. One side-effect of this new pattern is that we actually don’t need to store and sync secrets, but rather reproducible generate them at time of use.

I believe this is a different mental model and it took me a moment to realize and start thinking in it. Much is inspired by the cloudflare/gokey utility and it’s primary difference is the “vaultless” (not stateless) aspect.

That said, the vaultless pattern is not the best for all situations, API tokens generated by someone else are a great example. So to make the end-user experience the best I believe the vaultless and vault pattern both need to be able to co-exist.

No this is not the goal of this project.

The goal is to provide a vaultless alternative to managing secrets.

I’ve found at least 3 projects that query secrets from vault to make them available in nixos. While some have some interesting solutions, I don’t believe it makes sense to replicate any of their work into another new project.
I personally wanna avoid xkcd 927 :wink:

Plus I personally dislike (and do my best to avoid) a pattern for my infrastructure in which access to secrets relies on another “heavy service” that needs maintenance. It’s too easy to end up in a deadlock during disaster recovery or the need to buy a SaaS product to solve the availability issue.

4 Likes

hello all :wave: great to see this topic being spoken about with such careful consideration. secrets management in nix has been a long standing itch that i’ve never quite found an adequate scratch for.

since it’s something i’ve spent quite a bit of time on - both in nix and secrets-management more generally during my career - i figured i’d weigh in some of my thoughts.

as has been discussed already, while hashicorp vault is an excellent tool for secrets management across systems of varying shapes & sizes, i also agree it wouldn’t fit in nicely with aux. given the complexity, necessitated by its security posture as a secrets management product/tool, it demands a great deal of knowledge, time, and energy to manage & maintain properly, that i don’t think would be a reasonable barrier to anyone wanting to do “basic” secrets management for their aux systems.

one other approach i’d like to call attention to is the wonderfully simplistic mechanism that lollypops uses for secrets-management. i’ve been using it as my deployment tool for some time and have enjoyed the simplicity & flexibility it grants. since its interface is simply templating out commands (pass by default), it benefits from BYO tooling. for instance, i do indeed use pass and, because my store is encrypted using my gpg key which is only present on my yubikey, that flow naturally carries to the secrets for my nix systems.

i’m not necessarily saying aux should directly copy/mimic this, but perhaps there are some nice features or learnings to be garnered :slight_smile:

a major shortcoming which is worth mentioning is that, if a system with a secret persisted in a temporary location (such as a tmpfs mount) is rebooted, any service relying on the presence of that secret of course is not happy. this is where networked secrets-managers like vault shine, because there’s usually some retrieval process prior to the service starting. not sure what that looks like in nix/aux world; but certainly food-for-thought.

3 Likes

Thanks a ton, I’ve looked at lollypops quite some time ago and for the context of this project didn’t realize they do secret management.

From a quick scan of the code it looks like lollypops use SSH to copy secrets out-of-band, right!?

I do like the option to define the password backend by providing a binary/ cmd that issues them, (and the flexibility that comes with it) but it seems the mental model behind this solution is another variant of the “vault” approach.

I personally dislike the idea of needing a separate out-of-band copy process after reboot (or any form of querying passwords over the network [e.g. hashicorp vault]) or the need for the one unencrypted key on disc (sops-nix/ agenix).

The use-case of “self-unsealing secrets” with TPM support is one I am looking forward to implement with the help of SecureBoot/ Measured Boot.

1 Like

I like how helmfile/vals handles secret reference strings. Might be worth giving them a primitive akin to paths, but inherently tied to stateful data.

Existing secret implementations in Nix are clunky because they do very little to handle bootstrapping. Would be really nice if the API for secrets allowed us to specify generation commands, which could be run on activation, possibly with a step to incorporate them into the repo also.

clan-core seems to be working on an implementation of secrets that will have a mechanism for generation.

Integration with systemd-creds would also be nice for TPM unlocking and passing secret data to units.

From a quick glance at vals this looks very similar to our current idea which involves a FUSE read-only filesystem that exposes the generated secrets as files.
After all it’s one of the basic building blocks of the UNIX/Linux philosophy…

Agreed. I have quite a rant about offloading the secrets management issue to host SSH keys. The bootstrapping gets really cumbersome - my best idea so far is to involve terraform/ terranix to actually get a key into place which feels like a rather huge detour.

Can you give an example of what use-case you would love to see?

I’ve been loosely following their development, my understanding so far is that they are using sops, no?
Are there any details to their solution available?

I dislike this solution, it’s a vault based per machine encryption which is what this project aims to build an alternative to.

TPM based unlocking (as well as other methods as tang/clevis, remote SSH or a combination of them) should IMO totally be possible. But instead of unlocking specific secrets it would unlock the random seed that intializes the secrets generator.
I hope by making the “seed unlocking process” a modular thing that we can have a varity and combination of currently existing methods and maybe even something new.

1 Like

I think we need the possibility to derive non-secret data from secrets and have it automatically added to a lock file in the configuration.

Such a feature would allow to define a bootstrap secret that is stored directly on host filesystem / tpm.

This bootstrap secret could then be used by an agenix-like tool, or to fetch the secrets from a vault.

1 Like

Thanks for joining the conversation :tada:

I’m not sure I understand, can you provide more details and/or an exact use-case?

The main example I had in mind where ssh host keys.

With sops-nix / agenix, we need to manually generate and copy the ssh host keys before installation in order to encrypt secrets. A lock file would reduce manual steps during installation.

Edit: I realized that what i was thinking of was pretty similar to a simplified version of terraform data source and might not be in the scope of aux secrets.

I’m not sure I understand why a lock file would resolve the situation? Can you please elaborate?

From my perspective with a generative secrets approach one can define a schema for generating unique SSH machine keys that can be derived before the machine is even installed.

E.g. one could do gen_ecdsa_key_pair($seed, "v1:" + $fqdn) (pseudo-code) to generate a versioned SSH key pair that is based on a “secret seed” and the FQDN of the machine to be deployed with the machine during first installation.

1 Like