it all depends on your philosophy on dependencies. if you maintain a small set of core dependencies that are there for good reasons and are actively maintained, then rails upgrades are pretty easy. if you have a Gemfile that has a bunch of third party gems that you bring in for small problems here and there, you have to occasionally pay down that debt on version upgrades. we have an 18 year old rails codebase currently on 7.1 that hasn't proven to be a big pain for upgrades. the hardest upgrade we did was because of a core dependency that had been dead for 5 years broke with a new version of rails. but that was a story of letting technical debt ride for too long and having to pay it back.
this is a common problem in any complex codebase that has a culture of using third party dependencies to solve small problems. you see this conversation all the time with modern frontend development and the resulting dependency tree you get with npm etc....
Minidsp flex ht or htx paired with a buckeye 6 channel amp. As cheap as you can get premium sound quality. Not cheap but you get the software control you actually want via the minidsp
Preschool is just daycare with structure, so it costs more. Optional, privately owned. Nice to do 2-3 days a week for young kids to give them more social and learning opportunites. But it’s not public school, it’s usually just a small locally owned business.
And this was a co-op preschool, which is a special variety of private preschool (usually non-profit) where the parents are usually involved in classes with the kids and much of the maintenance of the school itself is handled through volunteerism of member families.
My wife served as treasurer for the penultimate year, saw the writing on the wall, and then turned the position over to someone else to actually wind down the school. The model just doesn't work where we live: it requires a large number of single-income families so that one parent can be full-time involved in the kids' upbringing, and housing prices are such that single-income families cannot afford homes in the area. As a result, their market just evaporated. People just can't do it anymore.
We've been running a production ceph cluster for 11 years now, with only one full scheduled downtime for a major upgrade in all those years, across three different hardware generations. I wouldn't call it easy, but I also wouldn't call it hard. I used to run it with SSDs for radosgw indexes as well as a fast pool for some VMs, and harddrives for bulk object storage. Since i was only running 5 nodes with 10 drives each, I was tired of occasional iop issues under heavy recovery so on the last upgrade I just migrated to 100% nvme drives. To mitigate the price I just bought used enterprise micron drives off ebay whenever I saw a good deal popup. Haven't had any performance issues since then no matter what we've tossed at it. I'd recommend it, though I don't have experience with the other options. On paper I think it's still the best option. Stay away from CephFS though, performance is truly atrocious and you'll footgun yourself for any use in production.
We're using CephFS for a couple years, with some PBs of data on it (HDDs).
What performance issues and footguns do you have in mind?
I also like that CephFS has a performance benefits that doesn't seem to exist anywhere else: Automatic transparent Linux buffer caching, so that writes are extremely fast and local until you fsync() or other clients want to read, and repeat-reads or read-after-write are served from local RAM.
We are the world's largest library of bike routes, and we enable cyclists to go on better rides, more often. We have a website and mobile apps that allow people to discover the best riding in their area, and get turn by turn navigation using either our mobile apps or the bike computer of their choosing. Come join us in taking Ride with GPS to the next level! We have two openings right now, and are starting to build out the hiring plan for a third:
Senior Software Engineer - API & Product Development: We are looking for an experienced backend engineer to join our small and effective engineering team with a focus on supporting web and mobile app development using our APIs. The right candidate for this role brings extensive experience supporting modern product development, in collaboration with frontend and mobile developers, product management, and design. This requires excellent communication and collaboration skills, both on the engineering side, and from a product perspective. We use rails, but prior rails experience is not required.
Senior Software Engineer - API Development: We are looking for an experienced backend engineer to join our small and effective team with a focus on our APIs and supporting our platform at scale. This doesn't mean you are isolated from product development &emdash; everything we do serves our users in some way, and being a small team we regularly share responsibilities. However, this role will spend more time on efficiency and system design rather than delivering this quarter's new features. The right candidate should have a depth of experience supporting a large API surface area with efficient, well organized code, and should be excited about maintaining and improving performance over time. Experience with developer tooling, database design, query optimization, and DevOps workflows will serve you well in this role. We use rails, but prior rails experience is not required.
Senior Software Engineer - iOS Development: In mid July, we will officially start the hiring process for an iOS developer, and potentially another Android engineer. We are reviewing applications for qualified candidates at this time, and will officially post the job by July 15th. If you think you are an excellent fit please apply now, however there might be some delays in screening, interviewing, etc while we finalize our hiring plan. We have a technically interesting, battery efficient set of mobile apps that act as a companion to our website, and need another iOS or Android engineer to help us take our apps to the next level.
Enterprise server gear is pretty reliable, and you build your infra to be fully redundant. In our setup, no single machine failure will take us offline. I have 13 machines in a rack running a > 10mm ARR business, and haven't had any significant hardware failures. We have had occasional drive failures, but everything is a RAID1 at a minimum so they are a non issue.
We just replaced our top of rack firewall/proxies that were 11 years old and working just fine. We did it for power and reliability concerns, not because there was a problem. App servers get upgraded more often, but that's because of density and performance improvements.
What does cause a service blip fairly regularly is a single upstream ISP. I will have a second ISP into our rack shortly, which means that whole class of short outage will go away. It's really the only weak spot we've observed. That being said, we are in a nice datacenter that is a critical hub in the pacific northwest. I'm sure a budget datacenter will have a different class of reliability problems that I am not familiar with.
But again, an occasional 15m outage is really not a big deal business wise. Unless you are running a banking service or something, no one cares when something happens for 15m. Heck, all my banks regularly have "maintenance" outages that are unpredictable. I promise, no one relaly cares about five nines of reliability in the strong majority of services.
Sounds great. Yep, what I mean is you will need to make your systems fully redundant before considering cloud if your business depends on reliability and uptime. That usually requires the business to reach a certain scale first.
Sure, but making something redundant is not really that difficult. HAProxy in front N nodes across M racks, ideally in separate DCs, and then a floating IP in front of your HAProxies. Set up hot standby for your DB.
I used to joke that my homelab almost had better reliability than any company I’d been at, save for my ISP’s spotty availability. Now that I have a failover WAN, it literally is more reliable. In the five years of running a rack, I’ve had precisely one catastrophic hardware failure (mobo died on a Supermicro). Even then, I had a standby node, so it was more of an annoyance (the standby ran hotter and louder) than anything.
I was bit by atop a few years back and swore it off. I would get perfectly periodic 10m hangs on MySQL. Apparently they changed the default runtime options such that it used an expensive metric gathering technique with a 10m cron job that would hang any large memory process on the system. It was one of those “no freaking way” revelations after 3 days troubleshooting everything.
Interesting reading through the related submission comments and seeing other hard to troubleshoot bugs. I don’t think atop devs are to blame, my guess is that what you have to do to make a tool like atop work means you are hooking into lots of places that have potential to have unintended consequences.
I'll bite, just so you get a real answer instead of the very correct but annoying "don't worry about it right now" answers everyone else is going to provide!
We have a rails monolith that sends our master database instance between 2,000 and 10,000 queries per second depending on the time of year. We have a seasonal bike business with more traffic in the summer. 5% of queries are insert/update/delete, the rest read.
mariadb (mysql flavor), all reads and writes sent just to master. Two slaves, one for live failover, the other sitting on a ZFS volume for backup snapshotting sending snapshots off to rsync.net (they are awesome BTW).
We run all our own hardware. The database machines have 512gb of ram and dual EPYC 74F3 24 core processors, backed by a 4 drive raid10 nvme linux software raid volume on top of micron 9300 drives. These machines also house a legacy mongodb cluster (actually a really really nice and easy to maintain key/value store, which is how we use it) on a separate raid volume, an elastic search cluster, and a redis cluster. The redis cluster often is doing 10,000 commands a second on a 20gb db, and the elastic search cluster is a 3tb full text search + geo search database that does about 150 queries a second.
In other words, mysql isn't single tenant here, though it is single tenant on the drives that back our mysql database.
We don't have any caching as it pertains to database queries. yes we shove some expensive to compute data in redis and use that as a cache, but it wouldn't be hitting our database on a cache miss, it would instead recalculate it on the fly from GPS data. I would expect to 3-5x our current traffic before considering caching more seriously, but I'll probably once again just upgrade machines instead. I've been saying this for 15 years....
At the end of 2024 I went on a really fun quest to cut our DB size from 1.4tb down to about 500gb, along with a bunch of query performance improvements (remove unnecessary writes with small refactors, better indexes, dropping unneeded indices, changing from strings to enums in places, etc). I spent about 1 week of very enjoyable and fast paced work to accomplish this while everyone was out christmas break (my day job is now mostly management), and prob would need another 2 weeks to go after the other 30% performance improvements I have in mind.
All this is to serve a daily average of 200-300 http requests per second to our backend, with a mix of website visitors and users of our mobile apps. I've seen 1000rps steady-state peak peak last year and wasn't worried about anything. I wouldn't be surprised if we could get up to 5,000rps to our API with this current setup and a little tuning.
The biggest table by storage and by row count has 300 million rows and I think 150gb including indexes, though I've had a few tables eclipse a billion rows before rearchitecting things. Basically, if you use DB for analytics things get silly, but you can go a long ways before thinking "maybe this should go in its own datastore like clickhouse".
Also, it's not just queries per second, but also row operations per second. mysql is really really fast. We had some hidden performance issues that allowed me to go from 10,000,000 row ops per second down to 200,000 row ops per second right now. This didn't really change any noticable query performance, mysql was cool for some things just doing a ton of full table scans all over the place....
wonderful, thank you. Some translations to AWS RDS...
"512gb of ram and dual EPYC 74F3 24 core processors, backed by a 4 drive raid10 nvme linux software raid volume on top of micron 9300 drives"
roughly translates to about an db.r8g.16xlarge (64 vCPUs, 512gb ram) $4,949 / month on-demand for compute
I'm not familiar enough with hardware to determine IOPS for the raid config but I believe it is greater than the maximum for io2 block express storage on aws (256k IOPS):
$0.10 per provisioned IOPS-month = 256000$.10 = $25,600 / month IOPS -- which feels high so I might be way off on the raid setup's IOPS
$0.125 per GB-month storage = 500gb $0.125 = $62.50
That's about $31,930 / month without any reserved discounts for an estimated capacity of 5,000 rps, sound about right? Would you say your total hardware cost is less than one or two months of comparable compute on AWS if the above is true?
Yup, last time I priced this in RDS I got to maybe $20k a month for two reserved instances across AZs.
I pay for our rack outright every 3-4 months from what I can tell. Still takes the same number of infra/ops/sre people as well. We staff 2, but really just have 1.25 worth of FTE work, you just need more for redundancy.
Pretty nuts! This is also why I am so dismissive of performance optimization. Yeah, I'll just buy a new set of three machines with 2tb of ram each in a few years and call it good, still come out ahead.