Introducing a separate charge specifically targeting those of your customers who...

kjuulh · 2025-12-16T17:53:32 1765907612

We've self-hosted github actions in the past, and self-hosting it doesn't help all that much with the fragile part. For github it is just as much triggering the actions as it is running them. ;) I hope the product gets some investment, because it has been unstable for such a long time, that on the inside it must be just the usual right now. GitHub has by far the worst uptime of any SaaS tools we use at the moment, and it isn't even close.

> Actions is down again, call Brent so he can fix it again...

Fabricio20 · 2025-12-16T18:56:55 1765911415

We self host the runners in our infrastructure and the builds are over 10x faster than relying on their cloud runners. It's crazy the performance you get from running runners on your own hardware instead of their shared CPU. Literally from ~8m build time on Gradle + Docker down to mere 15s of Gradle + Docker on self hosted CPUs.

matsimitsu · 2025-12-16T19:07:46 1765912066

This! We went from 20!! minutes and 1.2k monthly spend on very, very brittle action runs to a full CI run in 4 minutes, always passing, by just by going to Hetzner's server auction page and bid on a 100 euro Ryzen machine.

kjuulh · 2025-12-16T19:14:19 1765912459

After self hosting our builds ended up so fast, that we were actually waiting for was GitHub scheduling our agents, rather than it being the job running. It sucked a bit, because we'd optimized it so much, but we on 90th percentile saw that it took 20-30 seconds for github to schedule the jobs as they should. Measured from when the commit hit the branch, to the webhook begin sent.

pxc · 2025-12-16T19:51:51 1765914711

My company uses GitHub, GitLab, and Jenkins. We'll soon™ be migrating off of GitLab in favor of GitHub because it's a Microsoft shop and we get some kind of discount on GitHub for spending so much other money with Microsoft.

Scheduling jobs, actually getting them running, is virtually instant with GitLab but it's slow AF for GitHub for no discernable reason.

pxc · 2025-12-16T22:23:58 1765923838

lmao I just realized on this forum writing this way might sound like I own something. To be clear, I don't own shit. I typically write "my employer", and should have here.

btown · 2025-12-16T18:03:34 1765908214

> call Brent so he can fix it again

Not sure if a Phoenix Project reference, but if it is, it's certainly in keeping with Github being as fragile as the company in the book!

kjuulh · 2025-12-16T18:05:23 1765908323

It is xD On the outside it feels like a product held together with duct tape, wood glue and prayers.

cweagans · 2025-12-16T18:21:42 1765909302

Hey, don't insult wood glue like that.

chickensong · 2025-12-16T20:13:08 1765915988

Indeed, wood glue is amazing. Such slander is totally uncalled for.

steve_adams_86 · 2025-12-16T23:07:04 1765926424

I don't know, maybe it's a compliment. Wood glue can form bonds stronger than the material it's bonding. So, the wood glue in this case is better than the service it's holding together :)

bdangubic · 2025-12-16T23:08:04 1765926484

or prayers

tracker1 · 2025-12-16T18:25:26 1765909526

I tend to just rely on the platform installers, then write my own process scripts to handle the work beyond the runners. Lets me exercise most of the process without having to (re)run the ci/cd processes over and over, which can be cumbersome, and a pain when they do break.

The only self-hosted runners I've used have been for internalized deployments separate from the build or (pre)test processes.

Aside: I've come to rely on Deno heavily for a lot of my scripting needs since it lets me reference repository modules directly and not require a build/install step head of time... just write TypeScript and run.

kjuulh · 2025-12-16T18:33:45 1765910025

We choose github actions because it was tied directly to github providing the best pull-request experience etc. We actually didn't really use github actions templating as we'd got our own stuff for that, so the only thing github actions actually had to do was start, run a few light jobs as the CI was technically run elsewhere and then report the final status.

When you've got many 100s of services managing these in actions yaml itself is no bueno. As you mentioned having the option to actually be able to run the CI/CD yourself is a must. Having to wait 5 minutes plus many commits just to test an action drains you very fast.

Granted we did end up making the CI so fast (~ 1 minute with dependency cache, ~4 minutes without), that we saw devs running their setup less and less on their personal workstations for development. Except when github actions went down... ;) We used Jenkins self-hosted before and it was far more stable, but a pain to maintain and understand.

MathiasPius · 2025-12-18T08:29:20 1766046560

In 2023 I quoted a customer some 30 hours to deploy a Kubernetes cluster to Hetzner specifically to run self-hosted GitHub Actions Runners.

After 10-ish hours the cluster was operational. The remaining 18 (plus 30-something unbillable to satisfy my conscience) were spent trying and failing to diagnose an issue which is still unsolved to this day[1].

[1]: https://github.com/actions/runner-container-hooks/issues/113

featherless · 2025-12-16T18:08:55 1765908535

This is absolutely bananas; for my own CI workflow I'll have to pay $140+/month now just to run my own hardware.

hinkley · 2025-12-16T18:50:13 1765911013

Am I right in assuming it’s not the amount of payment but the transition from $0 to paying a bill at all?

I’m definitely sure it’s saving me more than $140 a month to have CI/CD running and I’m also sure I’d never break even on the opportunity cost of having someone write or set one up internally if someone else’s works - and this is the key - just as well.

But investment in CI/CD is investing in future velocity. The hours invested are paid for by hours saved. So if the outcome is brittle and requires oversight that savings drops or disappears.

saagarjha · 2025-12-16T22:20:02 1765923602

Have you ever set up GitHub Actions? The outcome is brittle because of their platform, not because of my inability to do CI.

hinkley · 2025-12-17T01:38:14 1765935494

I use them minimally and haven't stared at enough failures yet to see the patterns. Generally speaking my MO is to remove at least half of the moving parts of any CI/CD system I encounter and I've gone a multiple of that several times.

When CI and CD stop being flat and straightforward, they lose their power to make devs clean up their own messes. And that's one of the most important qualities of CI.

Most of your build should be under version control and I don't mean checked in yaml files to drive a CI tool.

brulard · 2025-12-17T08:39:41 1765960781

Exactly this. I've used Jenkins, Travis, CircleCI and all of them were so easy in comparison to the github actions runner mess.

featherless · 2025-12-16T21:23:21 1765920201

This is not investment in CI/CD. I already did that, by buying and investing in my own hardware, my own workflows, my own caching solution.

This is like if Dropbox started charging you money for the files you have stored on your backup hard drives.

gheltllck · 2025-12-17T02:40:54 1765939254

Don’t give them any ideas! This is actually a standard enshittification.

newsoftheday · 2025-12-16T20:37:34 1765917454

You're sounding a lot like a Microsoft damage control artist.

PeterHolzwarth · 2025-12-16T21:10:01 1765919401

Keep this kind of comment on reddit, not here.

newsoftheday · 2025-12-17T18:00:06 1765994406

I'll keep it where I like actually, thanks.

hinkley · 2025-12-16T21:32:48 1765920768

The only company I’ve held a grudge against longer than MS is McDonalds and they are sort of cut from the same cloth.

I’m also someone who paid for JetBrains when everyone still thought it wasn’t worth money to pay for a code editor. Though I guess that’s again now. And everyone is using an MS product instead.

hedgehog · 2025-12-16T18:21:33 1765909293

I'm curious, what are you doing that has over 1000 hours a month of action runtime?

featherless · 2025-12-16T18:24:59 1765909499

I run a local Valhalla build cluster to power the https://sidecar.clutch.engineering routing engine. The cluster runs daily and takes a significant amount of wall-clock time to build the entire planet. That's about 50% of my CI time; the other 50% is presubmits + App Store builds for Sidecar + CANStudio / ELMCheck.

Using GitHub actions to coordinate the Valhalla builds was a nice-to-have, but this is a deal-breaker for my pull request workflows.

hedgehog · 2025-12-16T19:13:31 1765912411

Cool, that looks a lot nicer than the OBD scanner app I've been using.

Eikon · 2025-12-16T18:27:57 1765909677

On ZeroFS [0] I am doing around 80 000 minutes a month.

A lot of it is wasted in build time though, due to a lack of appropriate caching facilities with GitHub actions.

[0] https://github.com/Barre/ZeroFS/tree/main/.github/workflows

featherless · 2025-12-16T18:30:01 1765909801

I found that implementing a local cache on the runners has been helpful. Ingress/egress on local network is hella slow, especially when each build has ~10-20GB of artifacts to manage.

esafak · 2025-12-16T19:23:19 1765912999

What do you use for the local cache?

featherless · 2025-12-16T19:25:16 1765913116

Just wrote about my approach yesterday: https://jeffverkoeyen.com/blog/2025/12/15/SlotWarmedCaching/

tl;dr uses a local slot-based cache that is pre-warmed after every merge to main, taking Sidecar builds from ~10-15 minutes to <60 seconds.

hedgehog · 2025-12-16T19:29:10 1765913350

ZeroFS looks really good. I know a bit about this design space but hadn't run across ZeroFS yet. Do you do testing of the error recovery behavior (connectivity etc)?

Eikon · 2025-12-16T19:59:44 1765915184

This has been mostly manual testing for now. ZeroFS currently lacks automatic fault injection and proper crash tests, and it’s an area I plan to focus on.

SlateDB, the lower layer, already does DST as well as fault injection though.

theLiminator · 2025-12-16T18:58:03 1765911483

Wow, that's a very cool project.

Eikon · 2025-12-16T19:09:00 1765912140

Thank you!

duped · 2025-12-16T19:02:10 1765911730

1 hour build/test time, 20 devs, that's 50 runs a month. Totally possible.

gheltllck · 2025-12-17T02:49:24 1765939764

GH actions templates don’t build all branches by default. I guess it’s due to them not wanting the free tier to use to much resources. But I consider it an anti-pattern to not build everything at each push.

sunnyday_002 · 2025-12-17T10:24:31 1765967071

That is because GH Actions is event based, that is more powerful and flexible than just triggering on every push and not letting it be configured.

``` on: push ```

is the event trigger to act on every push.

You'll waste a lot of CI on building everything in my opinion, I only really care about the pull request.

nyrikki · 2025-12-16T19:24:37 1765913077

I resorted to a local forgejo + woodpecker-ci. Every time I am forced back to GitHub for some reason it confirms I made the right choice.

In my experience gitlab always felt clunky and overly complicated on the back end, but for my needs local forgejo is better than the cloud options.

awestroke · 2025-12-16T18:13:04 1765908784

They still host all artefacts and logs for these self-hosted runs. Probably costs them a fair bit

gz09 · 2025-12-16T18:24:16 1765909456

They already charge for this separately (at least storage). Some compute cost may be justified but you'd wish that this change would come with some commitment of fixing bugs (many open for years) in their CI platform -- as opposed to investing all their resources in a (mostly inferior) LLM agent (copilot).

naikrovek · 2025-12-16T19:07:04 1765912024

Copilot uses other models, not (necessarily?) its own, so I’m not sure what you mean.

gz09 · 2025-12-16T20:43:19 1765917799

It does leverage various models, but

- github copilot PR reviews are subpar compared to what I've seen from other services: at least for our PRs they tend to be mostly an (expensive) grammar/spell-check

- given that it's github native you'd wish for a good integration with the platform but then when your org is behind a (github) IP whitelist things seem to break often

- network firewall for the agent doesn't seem to work properly

raised tickets for all these but given how well it works when it does, I might as well just migrate to another service

featherless · 2025-12-16T18:20:48 1765909248

There's absolutely no way that the cost scales with the usage of my own hardware. I cannot fathom this change in any way or form. Livid.

newsoftheday · 2025-12-16T20:39:43 1765917583

[flagged]

otterley · 2025-12-16T20:58:34 1765918714

I don't work for Microsoft (in fact, I work for a competitor), and I think it's totally reasonable to charge for workflow executions. It's not like they're free to build, operate, and maintain.

gheltlkckfn · 2025-12-17T04:26:51 1765945611

Well, they provide it for free for the fee tier. And has been for ages. Perhaps they shouldn’t provide rugpull services if they cannot afford it.

JonChesterfield · 2025-12-17T13:26:54 1765978014

Microsoft pleading poverty doesn't really fly

otterley · 2025-12-17T16:57:26 1765990646

Nobody's pleading poverty here. It's a reasonable business decision to charge for value, just like the rest of the economy does.

zahlman · 2025-12-16T18:11:30 1765908690

Meanwhile I'm just running `pytest`, `pyproject-build`, `twine` etc. at the command line....

(People seem to object to this comment. I genuinely do not understand why.)

colechristensen · 2025-12-16T19:04:12 1765911852

You don't trust devs to run things, to have git hooks installed, to have a clean environment, to not have uncommitted changes, to not have a diverging environment on their laptop.

Actions let you test things in multiple environments, to test them with credentials against resources devs don't have access to, to do additional things like deploys, managing version numbers, on and on

With CI, especially pull requests, you can leave longer running tests for github to take care of verifying. You can run periodic tests once a day like an hour long smoke test.

CI is guard rails against common failure modes which turn requiring everyone to follow an evolving script into something automatic nobody needs to think about much

zahlman · 2025-12-16T19:39:06 1765913946

> You don't trust devs to run things, to have git hooks installed, to have a clean environment, to not have uncommitted changes, to not have a diverging environment on their laptop.

... Is nobody in charge on the team?

Or is it not enough that devs adhere to a coding standard, work to APIs etc. but you expect them to follow a common process to get there (as opposed to what makes them individually most productive)?

> You can run periodic tests once a day like an hour long smoke test.

Which is great if you have multiple people expected to contribute on any given day. Quite a bit of development on GitHub, and in general, is not so... corporate.

I don't deny there are use cases for this sort of thing. But people on HN talking about "hosting things locally" seem to describe a culture utterly foreign to me. I don't, for example, use multiple computers throughout the house that I want to "sync". (I don't even use a smartphone.) I really feel strongly that most people in tech would be better served by questioning the existing complexity of their lives (and setups), than by questioning what they're missing out on.

colechristensen · 2025-12-17T02:47:50 1765939670

It seems like you may not have much experience working in groups of people.

>... Is nobody in charge on the team?

This isn't how things work. You save your "you MUST do these things" for special rare instructions. A complex series of checks for code format/lint/various tests... well that can all be automated away.

And you just don't get large groups of people all following the same series of steps several times a day, particularly when the steps change over time. It doesn't matter how "in charge" anybody is, neither the workplace nor an open source project are army boot camp. You won't get compliance and trying to enforce it will make everybody hate you and turn you into an asshole.

Automation makes our lives simpler and higher quality, particularly CI checks. They're such an easy win.

falsedan · 2025-12-16T20:00:12 1765915212

I think you could learn a lot about the other use cases if you asked some genuine questions and listened with intent

pseudosavant · 2025-12-16T18:46:15 1765910775

It passes on my machine. YOLO!

misnome · 2025-12-16T18:33:53 1765910033

Because you appear completely oblivious and deliberately naive about the entire purpose of CI.

zahlman · 2025-12-16T18:44:53 1765910693

Based on my experience I really do think most people are using it for things that they could perfectly well do locally with far less complication.

Perhaps that isn't most use of it; the big projects are really big.

wiether · 2025-12-16T19:01:24 1765911684

Care to provide examples?

Fundamentally, yes, what you run in a CI pipeline can run locally.

That's doesn't mean it should.

Because if we follow this line of thought, then datacenters are useless. Most people could perfectly host their services locally.

yjftsjthsd-h · 2025-12-16T19:23:25 1765913005

> Because if we follow this line of thought, then datacenters are useless. Most people could perfectly host their services locally.

There are a rather lot of people who do argue that? Like, I actually agree that non-local CI is useful, but this is a poor argument for it.

wiether · 2025-12-16T20:06:12 1765915572

I'm aware of people arguing for self-hosting some services for personal use.

I'm not aware of people arguing for self-hosting team or enterprise services.

eudamoniac · 2025-12-17T00:43:40 1765932220

Well, they are. Selling the team or enterprise a license to do just that is a rather large part of many businesses.

naikrovek · 2025-12-16T18:59:45 1765911585

Runners aren’t fragile, workflows are.

The runner software they provide is solid and I’ve never had an issue with it after administering self-hosted GitHub actions runners for 4 years. 100s of thousands of runners have taken jobs, done the work, destroyed themselves, and been replaced with clean runners, all without a single issue with the runners themselves.

Workflows on the other hand, they have problems. The whole design is a bit silly

falsedan · 2025-12-16T19:58:06 1765915086

it's not the runners, it's the orchestration service that's the problem

been working to move all our workflows to self hosted, on demand ephemeral runners. was severely delayed to find out how slipshod the Actions Runner Service was, and had to redesign to handle out-of-order or plain missing webhook events. jobs would start running before a workflow_job event would be delivered

we've got it now that we can detect a GitHub Actions outage and let them know by opening a support ticket, before the status page updates

naikrovek · 2025-12-17T02:32:42 1765938762

> before the status page updates

That’s not hard, the status page is updated manually, and they wait for support tickets to confirm an issue before they update the status page. (Users are a far better monitoring service than any automated product.)

Webhook deliveries do suffer sometimes, which sucks, but that’s not the fault of the Actions orchestration.

falsedan · 2025-12-17T10:45:45 1765968345

I'm seeing wonky webhook deliveries for Actions service events, like dropping them completely, while other webhooks work just fine. I struggle to see what else could be responsible for that behaviour. it has to be the case that the Actions service emits events that trigger webhook deliveries & sometimes it messes them up.

gheltlkckfn · 2025-12-17T04:18:04 1765945084

The orchestration service has been rewritten from scratch multiple times, in different languages even. How anyone can get it this wrong is beyond me.

The one for azure devops is even worse though, pathetic.