Hacker Newsnew | past | comments | ask | show | jobs | submit | reedlaw's commentslogin

Why is Haskell irrelevant to the argument that LLMs can't reliably permute programming knowledge from one language to another? In fact, the purity of the language and dearth of training data seems like the perfect test case to see whether concepts found in more mainstream languages are actually understood.

Because human programmers routinely fail to do that too. Haskell is an obscure language that came out of academic research. Several of the core semantics (like lazyness by default) didn't get adopted anywhere else and are only found in Haskell.

Then I would say this is another proof that LLMs lack intellect or ability to reason about universals. See https://michaelmangialardi.substack.com/i/186405810/test-4-p...

This is the second endorsement I've seen today. I gave OpenSpec a shot and was dismayed by the Explore prompt. [1] Over 1,000 words with verbose, repetitive instructions which will lead to context drift. The examples refer to specific tools like SQLite and OAuth. That won't help if your project isn't related to those.

I do like the basic concept and directory structure, but those are easy enough to adopt without all the cruft.

1. https://github.com/Fission-AI/OpenSpec/blob/main/src/core/te...


Do you have examples of the task maturation cycle? I'm not sure how it would work for tasks like extracting structured data from images. It seems it could only work for tasks that can be scripted and wouldn't work well for tasks that need individual reasoning in every instance.

No practical code example, sorry. The post is based on my own experience using agents, and I haven't reached a reusable generalization yet.

That said, two cases where I noticed the pattern:

Meal planning: I had a weekly ChatGPT task that suggested dinner options based on nutritional constraints and generated a shopping list (e.g. two dinners with 100g of chicken -> buy 200g). After a few iterations, it became clear that with a fixed set of recipes and their ingredients, a simple script generating combinations was enough. The agent's reasoning had already done its job — it helped me understand the problem well enough to replace itself.

QA exploration: I was using an agent to explore a web app as a QA tester. It took several minutes per run. After some iterations, the more practical path was having it log its explorations to a file, then derive automated tests from that log. The agent still runs occasionally, but the tests run frequently and cheaply.

Regarding your point about tasks that need individual reasoning every time — I think you're right, and that's actually the core of the idea. Not every task matures into a script. Extracting structured data from images probably stays deliberative if the images vary significantly. The cycle only applies to tasks that, after enough repetitions, reveal a stable pattern. The agent itself is what helps you discover whether that pattern exists.


How do you even begin to define objective measurements of software engineering productivity? You could use DORA metrics [1] which are about how effectively software is delivered. Or you could use the SPACE Framework [2] which is more about the developer experience.

1. https://cloud.google.com/blog/products/devops-sre/using-the-...

2. https://space-framework.com/


I don't have time for that mysticism. I just know.


Plasma Bigscreen has been around for 6 years: https://itsfoss.com/news/plasma-bigscreen-comeback/


I was referring to https://news.ycombinator.com/item?id=47283124

Anyway, I will try it in its current status. I basically need a launcher for the desktop Jellyfin app and not much more



Isn't this the premise behind Dilbert?


LLM-generated snippets of code are a breath of fresh air compared with much legacy code. Since models learn probability distributions they gravitate to the most common ways of doing things. Almost like having a linter built in. On the other hand, legacy code often does things in novel ways that leave you scratching your head--the premise behind sites like https://thedailywtf.com/


Maybe a better way to handle "minimize cyclomatic complexity" would be to set an agent in a loop of code metrics, refactor, test and repeat.


Good idea. Am still a bit shy around token budget spend.


I think this is at the moment the practical limitation to using AI for everything (and what the coding agents themselves also optimize for to some degree, or it's the slider they can play with for price vs quality, the "thinking" models being the exact same, but just burning more tokens).


Am waiting for the next Mac Studio to come out to experiment with the "AI for everything" approach. Most likely, the open source distilled models will lower quality. So, another "price vs quality" tradeoff. Still, will be fun to code like I'm at a foundation lab.


This seems like a perfect use case for a local model. But I've found in practice that the system requirements for agents are much higher than for models that can handle simple refactoring tasks. Once tool use context is factored in, there is very little room for models that perform decently.


What I hope to do with refactoring is to distill namespace and common patterns into a DSL. I am very curious about what tradeoffs you found.


Whatever agent I tried would include thousands of tokens in tool-use instruction. That would use up most available context unless running very low-spec models. I've concluded it's best to use the big 3 for most tasks and qwen on runpod for more private data.


I don't know about Opus, but Codex suddenly got a lot better to the point that I prefer it over Sonnet 4.6. Claude takes ages and comes up with half baked solutions. Codex is so fast that I miss waiting. It also writes tests without prompting.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: