More

sudb · 2026-03-10T00:56:50 1773104210

there definitely are some areas in software-land where graphing data and/or directly eyeballing it genuinely helps to spot patterns where statistical methods might be cumbersome/tricky/otherwise annoying, like log analysis[1]

[1] https://jvns.ca/blog/2022/12/07/tips-for-analyzing-logs/

sudb · 2026-03-10T00:48:44 1773103724

This sharp uptick in LLM in-context "learning" capabilities means I'm more excited than ever to try to get to grips with "new" languages like Nim or Gleam (but worried that using LLMs to help me get to a working end state will rob me of some of the experience of learning).

sudb · 2026-03-02T12:10:28 1772453428

Every MCP vs CLI argument I've seen really glosses over _where_ the agent is running, and how that makes a difference. For individual users where you're running agents locally, I totally agree that CLIs cover the vast majority of use cases, where available.

I think something I've not seen anyone mention is that MCPs make much more sense to equip agents on 3rd party platforms with the tools they need - often installing specific CLIs isn't possible and there's the question of whether you trust the platform with your CLI authentication key.

sudb · 2025-05-20T13:56:11 1747749371

We last submitted a SWE-Bench verified result in November 2024 - at the time I believe we were in the top 5 entrants.

We expect Engine to be as good as the other code-writing agents out there at the moment - we understand almost everyone in the space to be using very similar base models and agent scaffolding.

sudb · 2025-05-20T13:50:50 1747749050

the closest I could get to getting your LLM to identify itself was as LaMDA, which makes me think this is probably a Gemma model - am I close?

kuberwastaken · 2025-05-20T15:58:34 1747756714

Haha no, quite off

sudb · 2025-05-20T13:19:07 1747747147

Looks cool! If you're able to say - where/how do you run these virtual desktop instances?

sgtwompwomp · 2025-05-20T13:54:13 1747749253

It's on our production AKS cluster :) So fully scalable.

By the way, we've launched on ProductHunt btw! If you're interested in giving us an upvote (same for anyone else here!)

https://www.producthunt.com/posts/cyberdesk

sudb · 2025-05-20T13:10:54 1747746654

I know of https://modal.com/, which I believe is used by Codegen and Cognition.

Anecdotally-speaking, I hear that many companies in the LLM agent space roll their own sandbox solutions - I've heard of both Firecracker- and Kubernetes-based implementations.

sudb · 2025-05-20T12:40:00 1747744800

I use this for work - but there are edge cases all over the place that I keep running into (e.g. Yarn being installed on Github-hosted runners, but not self-hosted ones or act - https://github.com/actions/setup-node/issues/182)

Apart from that it's been quite good!

Aurornis · 2025-05-20T12:46:31 1747745191

Same experience here. Edge cases everywhere, though most can be worked around.

You can specify different runners to use. The default images are a compromise to keep size down. There is a very large image that tries to include everything you might want. I would suggest trying that if you don’t mind the very large (15GB IIRC) image.

sudb · 2025-05-20T13:25:43 1747747543

I definitely remember considering the larger images - I think we ended up not using them since my work's usecase for act is running user github workflows on-demand on temporary VMs. The hope was that most usage is covered by the smaller images - and in fairness that has been true so far.

sudb · 2025-05-20T12:37:02 1747744622

I worked on this! Happy to answer any questions anyone has.

sudb · 2025-04-26T10:03:41 1745661821

I had a problem recently trying to send LLM-generated text between two web servers under my control, from AWS to Render - I was getting 403s for command injection from Render's Cloudflare protection which is opaque and unconfigurable to users.

The hacky workaround which has been stably working for a while now was to encode the offending request body and decode it on the destination server.