That's not how LTS is supposed to work. You should be able to uograde effortless...

sofixa · on Dec 4, 2023

> That's not how LTS is supposed to work. You should be able to uograde effortlessly with minimum risk.

How do you do that on something as complex and with as many moving parts as Kubernetes? And how do you as an operator update that many things without checking there's no breaking changes in the patch?

Rantenki · on Dec 4, 2023

We upgrade our distros pretty much fearlessly, all the time. While I have had breakage from Kernel upgrades, they've been very rare (and generally related to third party closed drivers). Kubernetes is _not_ more complicated than the Linux kernel, but it is much more dangerous to upgrade in place.

eddythompson80 · on Dec 4, 2023

> Kubernetes is _not_ more complicated than the Linux kernel, but it is much more dangerous to upgrade in place.

eh, the kernel is an incredibly mature project with 1 machine scope. The kernel also has decades of operating systems research and literature to build on. Kubernetes in comparison is new, distributed and exploring uncharted territory in terms of feature set and implementation. Sometimes bad decisions are made, and it's fair to not want to live with them forever.

The kernel project looks very different today than it did in 1999.

There is a happy medium though, especially that Kubernetes is kinda far from it.

x86x87 · on Dec 5, 2023

Spoiler alert: k8s is not in uncharted territory.

Erlang and its runtime discovered and solved most of the problems in the 80s. We are slowly rediscovering this the same way react discovered the event loop that Windows had discovered in the 90s.

eddythompson80 · on Dec 5, 2023

Erlang solved the problem by making a custom VM that abstracts the network away for the most part and is pretty opinionated about how you do that. Kubernetes is not that. I don't see how Erlang is relevant here. You can run Erlang applications on kuberntes, not the other way around.

jen20 · on Dec 4, 2023

My answer is simple: don't. Use something far simpler and with fewer moving parts than Kubernetes, and something where crucial parts of the ecosystem required to make things even basically work are not outsourced to third party projects.

Nomad is a good solution.

x86x87 · on Dec 4, 2023

bingo! how do you do it? and do you want that kind of complexity to begin with?

sofixa · on Dec 4, 2023

Don't ask me, I'm firmly in the HashiCorp Nomad camp: https://atodorov.me/2021/02/27/why-you-should-take-a-look-at... (Note: quite old, some things are no longer true most notably around Nomad downsides)

rigrassm · on Dec 4, 2023

I'm with you, Nomad is highly underrated!

freedomben · on Dec 4, 2023

I don't see anywhere that GP said an LTS patch would take effort. They said the upgrade path to the next LTS would.

If you are talking about upgrade from LTS to LTS, can you give an example project where that is effortless? And if so, how do they manage to innovate and modernize without ever breaking backwards compatibility?

x86x87 · on Dec 4, 2023

Here: "it's easier to go from 1.20.1 to 1.20.5 than to 1.21, because there's less chance of breakage and less things that will change, but the process is pretty much the same"

LTS to LTS is another story. But the point is that L=LongTerm so in theory you're only going to do this exercise twice in a decade.

> manage to innovate and modernize without ever breaking backwards yeah. fuck backwards compatibility. that is for suckers. how about stopping the madness for a second and thinking about what you are building when you build it?

pixl97 · on Dec 4, 2023

> in theory you're only going to do this exercise twice in a decade.

So I've seen things like this in corporations many times and it typically works like this...

Well trained team sets up environment. Over time team members leave and only less senior members remain. They are capable of patching the system and keeping it running. Eventually the number of staff even capable of patching the system diminishes. System reaches end of life and vendor demands upgrading. System falls out of security compliance and everything around it is an organizational exception in one way or another. Eventually at massive cost from outside contractors the system gets upgraded and the cycle begins all over again.

Not being able to upgrade these systems is about the lack of and loss of capable internal staff.

x86x87 · on Dec 5, 2023

What is the cost of keeping it "up to date" vs doing this exercise once every say 7 years? Are most software systems even around for 7 years?

pixl97 · on Dec 5, 2023

Fossilization and security risk is the cost. I'm dealing with one of these systems that's been around like 5 and a half years. It no longer gets security updates so has risk exceptions in the organization. But the damn thing is like a spider and woven into dozens of different systems and to migrate to a newer version is going to take, I'm estimating hundreds to thousands of hours of work on updating those integrations alone. Then you have the primary application and dealing with the multitude of customizations that would have been a stepped upgrade changing a little bit of functionality, now having to have massive rewrites.

The cost either way was likely millions and millions of dollars. But now they are having to do it all at once and risk breaking workflows for tens of thousands of people in a multitude of different ways.

p_l · on Dec 5, 2023

Just upgrading the kernel on one of those "LTS" systems so that developers could start being ready for a kernel that wasn't 3.10 (and it turned out that core component of our app crashed due to... memory layout bug that accidentally worked on old kernels)...

I had to start by figuring all bits necessary to build not just kernel, but also external modules and attendant tools, using separate backported compiler because then-current LTS kernel wouldn't compile using distro-supplied GCC.

p_l · on Dec 5, 2023

I've recently worked in a high profile company where it took them long and painful to move from CentOS 6 to 7 (over a year long effort, IIRC, finished for prod in 2021? but with some crucial corp infra still on 6 in 2022).

In 2022 they had to start a new huge effort do deal with migration off CentOS7, and the problems were so painful it was considered reasonable to build a Linux distro from scratch and remove all traces of distro dependency from the product (SaaS)

pas · on Dec 5, 2023

that sounds really interesting, can you elaborate on the challenges? why was it so important for them to move off CentOS7, and why didn't they move to RHEL or Alma or Rocky or whatever similar?

p_l · on Dec 5, 2023

US Government woke up to the fact that allowing vendors waivers on requirements for upgrades ends up with nothing ever happening. CentOS7 is EOL'd next year. Additionally, there was fun of FIPS-140 and OpenSSL older than 3.0.

Alma and Rocky were considered, but that would still involve (possibly similarly painful) migration as with CentOS 6 -> 7.

Have you seen pricing for RHEL? We're talking hundreds thousands of systems. I never seen raw stats, but I would have been totally unsurprised to see them hit million instances across all clouds used, at least occassionally.

Decoupling software from distro dependencies was seen as a way to future proof deployment story and avoid situations like we had with CentOS 7, where they really, really would have liked upgrading some stuff for newer APIs, but couldn't due to mess with OS-provided dependencies.

pas · on Dec 5, 2023

decoupling meant something like using "distroless" or static builds (musl?) or simply shipping everything on an alpine/ubuntu/debian/whatever image? (and previously there was no containerization, but now there is)