13 Years of Building Infrastructure Control Planes in Ruby

itsthecourier · on Aug 26, 2024

We have been using ruby extensively through our org for years.

We tried java, go, node, elixir, python.

Pretty much it comes down to ruby on rails have a very opitionated way of doing things.

We mix some DDD and recently High Performance PostgreSQL for Ruby on Rails https://www.amazon.com/dp/B0CX876RLY

It is slow. How much? According to tech empowered https://www.techempower.com/benchmarks/#hw=ph&test=query&sec... something like 6 times slower than java-vertx-postgres. But we have seen slow and shitty apps coded in go and java too, most of the time slow apps come from people gluing libraries without thinking in the cost each has by doing a profiling.

Me moved to rails and flutter to have a quick coding hot-reloading loop

And we determined it was easier for our devs to all master a tool, instead of jumping between frameworks and DBS (we use rails, Kamal, postgres, redis and clickhouse, datadog)

The result has been awesome. Our web pages pass Google pagespeed in green, we serve millions of requests a day without issues, datadog APM helps a lot and they move between projects easily. The have hotspots like query autocompletion in one of our sites, which we do in memory with a trie library which runs a C library underneath for under 100ms response times.

If your org is competing in a really really competitive field and you need to reduce costs to zero (as whatsapp had to do with Freebsd and erlang) go ahead and do your tradeoffs. But if you don't, rails is the one man army framework. Even more with hotwire/solidqueue. Even more considering my go to calculation is: "infrastructure should be 10% of revenue in SaaS", with rails that stills olds true

set5think · on Aug 26, 2024

Great article and really great pragmatism. Heroku was a pioneer in excellent software design and development, more than they may ever get credit for. It’s not a surprise to me that the author worked at Heroku and thinks like a pro.

I love Ruby. I love rails. I love gems. I love bundler. I’m not sure why the Ruby community is so much stronger at simple software design than any other I’ve been a part of.

I spend most of my time in Python right now, and it’s fine, but I don’t love it like I loved Ruby. The only reason I have to be in Python is because the world of data engineering chose Python as their language. That basically always puts me in the unfortunate situation of having to choose between getting a project/task done using a Python library, or have to write it more by hand in Ruby, and you know which one wins :(.

As far as I can tell, there’s no reason Ruby couldn’t be the language of choice for data engineering, it’s just that no one has spent the time to make it that yet. I wish I could commit my life to that, I would, if nothing else mattered.

Tl;dr, author’s point of view is great, I bet their control plane is great; Ruby is fantastic; and someone should go make a pandas for Ruby and name it something even better like raccoons!

ozgune · on Aug 26, 2024

(Ozgun from Ubicloud)

Thank you for the kind words!

Daniel has a few gems in this blog post and we tried to italicize some of them. My favorite one is around "There is no code without a theory of testing."

"If you have 100% branch coverage, it doesn't mean you've covered all the cases. But it does mean that, whenever an obscure fault is understood in production, or even merely observed in development, there is an incremental path to add it to the base of knowledge in the tests: there are no spans of code with no test model."

set5think · on Aug 26, 2024

Also not surprised to find out that Ozgun from Citus is behind these strong principles! :)

Yea I mean that can basically be assessed as a paradox: how can you possibly write a line of code that you don’t know how to validate its correctness!?

jfyasldfjwoy · on Aug 26, 2024

I'm still not convinced Ruby is a good choice (over jvm, go, rust, or BEAM).

The REPL point is interesting and I wouldn't mind more elaboration/exploration there.

fdr · on Aug 29, 2024

Sure, it reduces costs quite dramatically to be able to do stuff like this:

    upgrade_check_ssh = ->(vmh) do
      p [vmh.ubid, vmh.created_at, vmh.sshable.host]
      vmh.sshable.cmd(<<BASH)
    set -xeuo pipefail
    sudo apt-get update -qq && sudo apt -qq -y satisfy 'openssh-server (>= 1:8.9p1-3ubuntu0.10)' && sudo systemctl restart ssh.service
    BASH
      vmh.sshable.cmd(<<VERIFY)
    set -xeuo pipefail
    dpkg-query --showformat='${Version}\n' --show openssh-server
    ssh_pid="$(systemctl show -p MainPID ssh.service | cut -d= -f2)"
    (set +e && sudo grep -F deleted "/proc/$ssh_pid/maps" ; [ $? -eq 1 ])
    VERIFY
    end
    
    cohort_draining = VmHost.where(allocation_state: 'draining').order_by(:created_at)
    
    cohort_draining.map { upgrade_check_ssh.call(_1).tap { sleep 3 } }

This is me upgrading OpenSSH on July 1st to account for the RCEs reported at that time on some low impact servers.

I then wrote many minor variants, to change the cohort (eventually targeting all servers), as well as a verification pass. The methodology and output is recorded, along with the time, in Slack. That's how I'm able to roll the tape for you now with precision, almost two months later, in late August. This kind of precision in recall and methodology is important for efficient operations...especially when things go wrong. A common thing we do, upon seeing, say, a broken VM Host, is paste its identifier into slack, to see if it's something of a troublemaker. From people's other code-and-output pastes, we can see what they ascertained, and how, and what was done.

I would not consider a language without a robust REPL for this kind of work. It is connected with an integrated develop-operate model, where the people writing the programs in these symbols every day are also assaying the problems. This unification is key.

And, somewhat related to that, I have not seen JVM nor BEAM libraries as high quality as Sequel, Roda, and Rodauth in their respective functions, and roughly in that order of importance, descending. These dependencies are invasive to how my code is written: above, you see some Sequel. We rely on other libraries being high quality (e.g. the pg driver gem, or net-ssh), but they are less invasive in this crucial way.

I did, at various points, consider applying this methodology to Python (the grammer's whitespace sensitivity is a serious problem, consider "cpaste"), TypeScript, Elixir, Scala, Julia, and even Swift. Although these rather conspicuously have REPLs, none have a Sequel.

I think people could make other REPL-enabled choices that work for them. But in my evaluation, some of the features of these runtimes did not overcome the consideration of a handful of key libraries.