I am using unattended-upgrades across multiple servers. I would like package updates to be rolled out gradually, either randomly or to a subset of test/staging machines first. Is there a way to do that for APT on Ubuntu?
An obvious option is to set some machines to update on Monday and the others to update on Wednesday, but that only gives me only weekly updates...
The goal of course is to avoid a Crowdstrike-like situation on my Ubuntu machines.
edit: For example. An updated openssh-server comes out. One fifth of the machines updates that day, another fifth updates the next day, and the rest updates 3 days later.
Maybe you could switch to an image based distro which is easy to roll back and won't boot into a broken image.
Which distro is image based and have the staggered rollout feature I'm after?
You don't need the staggered rollout since it won't boot into a broken image and you can boot easily into an old one if you don't like the new one. E.g. fedora atomic.
I'm not up to date with vanilla os for the debian world if it is on par with fedora.
I am not worried about upgrades so bad that they literally don't boot. I am worried about all the possible problems that might break my service.
You also roll back package versions. I'm not sure what problems could arise.
I can roll back with APT too, my question is how to do the staggered rollout.
You have to reboot for an image update. Hence, you can update the computers at different times and days.
This doesn't seem to enhance my workflow at all. Seems I now would have to reboot, and I still need to find a separate tool to coordinate/stagger updates, like I do now. Or did I miss something?
If the os works always (atomic image based distro), and the docker container work, and both can roll back easily. What else could go wrong?
Don't overthink it :)
I am not sure what you are taking about. My question is about APT.
No, OP absolutely still need staggered rollout. Immutable distros are a blue-green deployment self-contained. Yet, all the instance can upgrade and switch all at once and break all of them. OP still need some rollout strategy externally to prevent the whole service being brought down.