More

reedlaw · 2026-03-19T17:30:23 1773941423

Why is Haskell irrelevant to the argument that LLMs can't reliably permute programming knowledge from one language to another? In fact, the purity of the language and dearth of training data seems like the perfect test case to see whether concepts found in more mainstream languages are actually understood.

mike_hearn · 2026-03-20T08:53:50 1773996830

Because human programmers routinely fail to do that too. Haskell is an obscure language that came out of academic research. Several of the core semantics (like lazyness by default) didn't get adopted anywhere else and are only found in Haskell.

reedlaw · 2026-03-20T11:29:53 1774006193

Then I would say this is another proof that LLMs lack intellect or ability to reason about universals. See https://michaelmangialardi.substack.com/i/186405810/test-4-p...

reedlaw · 2026-03-18T15:35:21 1773848121

This is the second endorsement I've seen today. I gave OpenSpec a shot and was dismayed by the Explore prompt. [1] Over 1,000 words with verbose, repetitive instructions which will lead to context drift. The examples refer to specific tools like SQLite and OAuth. That won't help if your project isn't related to those.

I do like the basic concept and directory structure, but those are easy enough to adopt without all the cruft.

1. https://github.com/Fission-AI/OpenSpec/blob/main/src/core/te...

reedlaw · 2026-03-09T17:40:32 1773078032

Do you have examples of the task maturation cycle? I'm not sure how it would work for tasks like extracting structured data from images. It seems it could only work for tasks that can be scripted and wouldn't work well for tasks that need individual reasoning in every instance.

juanpabloaj · 2026-03-09T18:27:03 1773080823

No practical code example, sorry. The post is based on my own experience using agents, and I haven't reached a reusable generalization yet.

That said, two cases where I noticed the pattern:

Meal planning: I had a weekly ChatGPT task that suggested dinner options based on nutritional constraints and generated a shopping list (e.g. two dinners with 100g of chicken -> buy 200g). After a few iterations, it became clear that with a fixed set of recipes and their ingredients, a simple script generating combinations was enough. The agent's reasoning had already done its job — it helped me understand the problem well enough to replace itself.

QA exploration: I was using an agent to explore a web app as a QA tester. It took several minutes per run. After some iterations, the more practical path was having it log its explorations to a file, then derive automated tests from that log. The agent still runs occasionally, but the tests run frequently and cheaply.

Regarding your point about tasks that need individual reasoning every time — I think you're right, and that's actually the core of the idea. Not every task matures into a script. Extracting structured data from images probably stays deliberative if the images vary significantly. The cycle only applies to tasks that, after enough repetitions, reveal a stable pattern. The agent itself is what helps you discover whether that pattern exists.

reedlaw · 2026-03-07T15:37:15 1772897835

How do you even begin to define objective measurements of software engineering productivity? You could use DORA metrics [1] which are about how effectively software is delivered. Or you could use the SPACE Framework [2] which is more about the developer experience.

1. https://cloud.google.com/blog/products/devops-sre/using-the-...

2. https://space-framework.com/

echelon · 2026-03-07T17:13:01 1772903581

I don't have time for that mysticism. I just know.

reedlaw · 2026-03-07T13:59:12 1772891952

Plasma Bigscreen has been around for 6 years: https://itsfoss.com/news/plasma-bigscreen-comeback/

darkwater · 2026-03-07T15:13:00 1772896380

I was referring to https://news.ycombinator.com/item?id=47283124

Anyway, I will try it in its current status. I basically need a launcher for the desktop Jellyfin app and not much more

reedlaw · 2026-03-07T13:14:11 1772889251

https://github.com/mickeynp/combobulate

reedlaw · 2026-03-06T15:27:04 1772810824

Isn't this the premise behind Dilbert?

reedlaw · 2026-03-05T12:41:49 1772714509

LLM-generated snippets of code are a breath of fresh air compared with much legacy code. Since models learn probability distributions they gravitate to the most common ways of doing things. Almost like having a linter built in. On the other hand, legacy code often does things in novel ways that leave you scratching your head--the premise behind sites like https://thedailywtf.com/

reedlaw · 2026-03-04T15:49:22 1772639362

Maybe a better way to handle "minimize cyclomatic complexity" would be to set an agent in a loop of code metrics, refactor, test and repeat.

musesum · 2026-03-04T16:04:05 1772640245

Good idea. Am still a bit shy around token budget spend.

Cthulhu_ · 2026-03-04T16:45:30 1772642730

I think this is at the moment the practical limitation to using AI for everything (and what the coding agents themselves also optimize for to some degree, or it's the slider they can play with for price vs quality, the "thinking" models being the exact same, but just burning more tokens).

musesum · 2026-03-04T16:59:02 1772643542

Am waiting for the next Mac Studio to come out to experiment with the "AI for everything" approach. Most likely, the open source distilled models will lower quality. So, another "price vs quality" tradeoff. Still, will be fun to code like I'm at a foundation lab.

reedlaw · 2026-03-04T16:53:17 1772643197

This seems like a perfect use case for a local model. But I've found in practice that the system requirements for agents are much higher than for models that can handle simple refactoring tasks. Once tool use context is factored in, there is very little room for models that perform decently.

musesum · 2026-03-04T17:07:09 1772644029

What I hope to do with refactoring is to distill namespace and common patterns into a DSL. I am very curious about what tradeoffs you found.

reedlaw · 2026-03-04T18:09:20 1772647760

Whatever agent I tried would include thousands of tokens in tool-use instruction. That would use up most available context unless running very low-spec models. I've concluded it's best to use the big 3 for most tasks and qwen on runpod for more private data.

reedlaw · 2026-03-03T22:29:11 1772576951

I don't know about Opus, but Codex suddenly got a lot better to the point that I prefer it over Sonnet 4.6. Claude takes ages and comes up with half baked solutions. Codex is so fast that I miss waiting. It also writes tests without prompting.