Hacker Newsnew | past | comments | ask | show | jobs | submit | more icoder's commentslogin

This reminds me of my interactions lately with ChatGPT where I gave into its repeated offer to draw me an electronics diagram. The result was absolute garbage. During the subsequent conversation it kept offering to include any new insights into the diagram, entirely oblivious to its own incompetence.


I can understand how/that this works, but it still feels like a 'hack' to me. It still feels like the LLM's themselves are plateauing but the applications get better by running the LLM's deeper, longer, wider (and by adding 'non ai' tooling/logic at the edges).

But maybe that's simply the solution, like the solution to original neural nets was (perhaps too simply put) to wait for exponentially better/faster hardware.


This is exactly how human society scaled from the cavemen era to today. We didn't need to make our brains bigger in order to get to the modern industrial age - increasingly sophisticated tool use and organization was all we did.

It only mattered that human brains are just big enough to enable tool use and organization. It ceased to matter once our brains are past a certain threshold. I believed LLMs are past this threshold as well (it has not 100% matched human brain or ever will, but this doesn't matter.)

An individual LLM call might lack domain knowledge, context and might hallucinate. The solution is not to scale the individual LLM and hope the problems are solved, but to direct your query to a team of LLMs each playing a different role: planner, designer, coder, reviewer, customer rep, ... each working with their unique perspective & context.


I get that feeling too - the underlying tech has plateaued, but now they're brute force trading extra time and compute for better results. I don't know if that scale anything but, at best, linearly. Are we going to end up with 10,000 AI monkeys on 10,000 AI typewriters and a team of a dozen monkeys deciding which one's work they like the most?


> the underlying tech has plateaued, but now they're brute force trading extra time and compute for better results

You could say the exact same thing about the original GPT. Brute forcing has gotten us pretty far.


How much farther can it take us? Apparently they've started scaling out rather than up. When does the compute become too cost prohibitive?


Until recently, training-time compute was the dominant cost, so we're really just getting started down the test-time scaling road.


Yes. It works pretty well.


grug think man-think also plateau, but get better with tool and more tribework

Pointy sticks and ASML's EUV machines were designed by roughly the same lumps of compute-fat :)


This is an interesting point. If this ends up working well after being optimized for scale it could become the dominant architecture. If not it could become another dead leaf node in the evolutionary tree of AI.


Isn't that kinda why we have collaboration and get in room with colleagues to discuss ideas? i.e., thinking about different ideas, getting different perspectives, considering trade-offs in various approaches, etc. results in a better solution than just letting one person go off and try to solve it with their thoughts alone.

Not sure if that's a good parallel, but seems plausible.


Maybe this is the dawn of the multicore era for LLMs.


It's basically a mixture of experts but instead of a learned operator picking the predicted best model, you use a 'max' operator across all experts.


You could argue that many aspects of human cognition are "hacks" too.


…like what? I thought the consensus was that humans exhibit truly general intelligence. If LLMs require access to very specific tools to solve certain classes of problems, then it’s not clear that they can evolve into a form of general intelligence.


What would you call the very specialized portions of our brains?

The brain is not a monolith.


Specifically, which portions of the brain are “very specialized”? I’m not aware of any aspect of the brain that’s as narrowly applied to tasks as the tools LLMs use. For example, there’s no coding module within the brain - the same brain regions you use when programming could be used to perform many, many other tasks.


Broca's area, Wernicke's area, visual and occipital cortices (the latter of which, if damage occurs, can cause loss of sight).


Most people with aphasia can still swear because it's handled by the reptilian part of the brain. ahaha


Are you able to point to a coding module in an LLM?


They are, but I think the keyword is "generalization". Humans do very well when innovation is required, because innovation needs generalized models that can be used to make very specialized predictions and then meta-models that can predict how specialized models relate to each other and cross reference those predictions. We don't learn arithmetic by getting fed terabytes of text like "1+1=2". We only use text to communicate information, but learn the actual logic and concept behind arithmetic, and then we use that generalized model for arithmetic in our reasoning.

I struggle to imagine how much further a purely text based system can be pushed - a system that basically knows that 1+1=2 not because it has built an internal model of arithmetic, but because it estimates that the sequence of `1+1=` is mostly followed by `2`.


They have somewhat an internal model of arithmetic, with lookup tables and separate treatment of digits. I'm conscious you might have seen this already and not interpret it like that, but in case you haven't section 6 on addition in this Anthropic interpretability paper goes into it.

https://transformer-circuits.pub/2025/attribution-graphs/bio...

Keep in mind that is a basic level of understanding of what is going on in quite a small model (Claude 3.5 Haiku). We don't know what is happening inside larger models.


So Long, and Thanks for All the Krill


This nicely describes where we're at with LLM's as I see it: they are 'fancy' enough to be able to write code yet at the same time they can't be trusted to do stuff which can be solved with a simple hook.

I feel that currently improvement mostly comes from slapping what to me feels like workarounds on top of something that very well may be a local maximum.


> they are 'fancy' enough to be able to write code yet at the same time they can't be trusted to do stuff which can be solved with a simple hook.

Humans are fancy enough to be able to write code yet at the same time they can’t be trusted to do stuff which can be solved with a simple hook, like a simple formatter or linter. That’s why we still run those on CI. This is a meaningless statement.


One is a machine the other one is not. People have to stop comparing LLMs to humans. Would you hold a car to human standards?


The machine just needs to be coded to run stuff (as shown in this very post). My coworkers can’t be coded to follow procedures and still submit PRs failing basic checks, sadly.


A self driving car, yes.


Claude Code is an agent, not an LLM. Literally this is software that was released 4mo ago. lol.

1y ago - No provider was training LLMs in an environment modeled for agentic behavior - ie in conjunction with software design of an integrated utility.

'slapped on workaround' is a very lazy way to describe this innovation.


> Literally this is software that was released 4mo ago.

Feels like ages


That's what a singularity feels like.


Someone described LLMs in the coding space as stone soup. So much stuff is being created around then to make them work better that at some point it feels like you'll be able to remove the LLM part of the equation


We cant deny the LLM has utility. You cant eat the stone but the LLM can implement design patterns for example.

I think this insistance on near autonomous agents is setting the bar too high, which wouldnt be an issue if these companies werent then insisting that the bar is set just right.

These things understand language perfectly, theyve solved NLP because thats what they model extremely well. But agentic stuff is modelled by reinforcement learning and until thats in the foundation model itself (at the token prediction level) these things have no real understanding of state spaces being a recursive function of action spaces and such stuff. And they cant autonomously code or drive or manage a fund until they do


Humans use tools, so does AI. Does us make any less valuable as humans because we use bicycles and hammers? Why would it be bad for an AI to use tools?


Here in the Netherlands the impact of Nitrogen coming from cattle excretions (mostly through ammonia I believe) is paralizing the entire country (due to the impact on the environment, it's now blocking the building of - very much needed - housing. So there could be a win/win/win/win there.


Interesting. First time I've heard of this outside the UK. In my local area, there's a near total moratorium on new-builds. The reasons are complex, but it's a mixture of agriculture having poisoned all the rivers, housing which is not connected to mains waste water (and people just not maintaining their private waste water systems, which are often just tanks of excrement mixed with chemicals, overflowing into nature) and, even if houses connected to mains, those constantly overflow into storm drains and make the rivers and coasts dangerous to swim in. All of that while it's completely clear that if we need one thing, it's more housing. Quite a predicament we find ourselves in.


California is like this, for different reasons. Mostly the leaders think nature > humanity so the more they cap the knees of civilization, more for nature, and that's a good and meaningful legacy in their minds. Of course this is a politically dangerous thing to speak up about publicly, so it's more along the lines of "uhhh we need to make sure the house you build is safe, so you need 50,000 pages describing how safe it is, must be evaluated by an army of Phds, and rejections take 5 years"


Reminds me of when I was developing an application 'in' Facebook (when it was mostly friends but with adds for addictive games in the sidebar)


What inactions? Apart from creating safe conditions beforehand (but perhaps that is your point), once my kid is asleep there's not much more I can do? Most of that time I'm sleeping myself.


Exactly, there's not much more you can do when you're asleep. However, this system might just nudge you awake if something happens when you're asleep. Or it won't and you would be no worse off than if you didnt have it.

It's not going to solve all problems.


Parent 1: what's that sound? Should we check it?

Parent 2: Nah, the baby monitor would have warned us.


I'm sometimes thinking about account verification that requires work/effort over time, could be something fun even, so that it becomes a lot harder to verify a whole army of them. We don't need identification per se, just being human and (somewhat) unique.

See also my other comment on the same parent wrt network of trust. That could perhaps vet out spammers and trolls. On one and it seems far fetched and a quite underdeveloped idea, on the other hand, social interaction (including discussions like these) as we know it is in serious danger.


I'm more and more convinced of an old idea that seems to become more relevant over time: to somehow form a network of trust between humans so that I know that your account is trusted by a person (you) that is trusted by a person (I don't know) [...] that is trusted by a person (that I do know) that is trusted by me.

Lots of issues there to solve, privacy being one (the links don't have to be known to the users, but in a naive approach they are there on the server).

Paths of distrust could be added as negative weight, so I can distrust people directly or indirectly (based on the accounts that they trust) and that lowers the trust value of the chain(s) that link me to them.

Because it's a network, it can adjust itself to people trying to game the system, but it remains a question to how robust it will be.


I think technically this is the idea that GPG's web of trust was circling without quite staring at, which is the oddest thing about the protocol: it's used mostly today for machine authentication, which it's quite good at (i.e. deb repos)...but the tooling actually generally is oriented around verifying and trusting people.


Yeah exactly, this was exactly the idea behind that. Unfortunately, while on paper it just sounds like a sound idea, at least IMO, though ineffective, it has proven time and time again that the WOT idea in PGP has no chance against the laziness of humans.



Matrix protocol or at least the clients agree that several emoji is a key - which is fine - and you verify by looking at the keys (on each client) at the same time in person, ideally. I've only ever signed for people in person, and one remote attestation; but we had a separate verified private channel and attested the emoji that way.


Do these still happen? They were common (-ish, at least in my circles) in the 90s during the crypto wars, often at the end of conferences and events, but I haven't come across them in recent years.


I actually built this once, a long time ago for a very bizarre social network project. I visualised it as a mesh where individuals were the points where the threads met, and as someone's trust level rose, it would pull up the trust levels of those directly connected, and to a lesser degree those connected to them - picture a trawler fishing net and lifting one of the points where the threads meet. Similarly, a user whose trust lowered over time would pull their connections down with them. Sadly I never got to see it at the scale it needed to become useful as the project's funding went sideways.


Yeah building something like this is not a weekend project, getting enough traction for it to make sense is another orders of magnitude beyond that.

I like the idea of one's trust to leverage that of those around them. This may make it more feasible to ask some 'effort' for the trust gain (as a means to discourage duplicate 'personas' for a single human), as that can ripple outward.


How would 'trust' manifest? A karma system?

How are individuals in the network linked? Just comments on comments? Or something different?


The system I built it for was invite only so the mesh was self-building, and yeah, there was a karma-like system that affected the trust levels, which in turn then gave users extra privileges such as more invites. Most of this was hidden from the users to make it slightly less exploitable, though if it had ever reached any kind of scale I'd imagine some users would work out ways to game it.


Ultimately, guaranteeing common trust between citizens is a fundamental role of the State.

For a mix of ideological reasons and lack of genuine interest for the internet from the legislators, mainly due to the generational factor I'd guess, it hasn't happened yet, but I expect government issued equivalent of IDs and passports for the internet to become mainstream sooner than later.


> Ultimately, guaranteeing common trust between citizens is a fundamental role of the State.

I don’t think that really follows. Businesses credit bureaus and Dun & Bradstreet have been privately enabling trust between non-familiar parties for quite a long time. Various networks of merchants did the same in the Middle Ages.


> Businesses credit bureaus and Dun & Bradstreet have been privately enabling trust between non-familiar parties for quite a long time.

Under the supervision of the State (they are regulated and rely on the justice and police system to make things work).

> Various networks of merchants did the same in the Middle Ages.

They did, and because there was no State the amount of trust they could built was fairly limited compared to was has later been made possible by the development of modern states (the industrial revolution appearing in the UK has partly been attributed to the institutional framework that existed there early).

Private actors can, and do, and have always done, build their own makeshift trust network, but building a society-wide trust network is a key pillar of what makes modern states “States” (and it directly derives from the “monopoly of violence”).


Havala (https://it.m.wikipedia.org/wiki/Hawala) or other similar way to transfer money abroad are working over a net of trust, but without any state trust system.


Compare its use to SWIFT and you'll see the difference.


That’s not really what research on state formation has found. The basic definition of a state is “a centralized government with a monopoly on the legitimate use of force”, and as you might expect from the definition, groups generally attain statehood by monopolizing the use of force. In other words, they are the bandits that become big enough that nobody dares oppose them. They attain statehood through what’s effectively a peace treaty, when all possible opposition basically says “okay, we’re submit to your jurisdiction, please stop killing us”. Very often, it actually is a literal peace treaty.

States will often co-opt existing trust networks as a way to enhance and maintain their legitimacy, as with Constantine’s adoption of Christianity to preserve social cohesion in the Roman Empire, or all the compromises that led the 13 original colonies to ratify the U.S. constitution in the wake of the American Revolution. But violence comes first, then statehood, then trust.

Attempts to legislate trust don’t really work. Trust is an emotion, it operates person-to-person, and saying “oh, you need to trust such-and-such” don’t really work unless you are trusted yourself.


> The basic definition of a state is “a centralized government with a monopoly on the legitimate use of force

I'm not saying otherwise (I've even referred to this in a later comment).

> But violence comes first, then statehood, then trust.

Nobody said anything about the historical process so you're not contradicting anyone.

> Attempts to legislate trust don’t really work

Quite the opposite, it works very, very well. Civil laws and jurisdiction on contracts have existed since the Roman Republic, and every society has some equivalent (you should read about how the Taliban could get back to power so quickly in big part because they kept doing civil justice in the rural afghan society even while the country was occupied by the US coalition).

You must have institutions to be sure than the other party is going to respect the contract, so that you don't have to trust them, you just need to trust that the state is going to enforce that contract (what they can do because they have the monopoly of violence and can just force the party violating the contract into submission).

With the monopoly of violence comes the responsibility to use your violence to enforce contracts, otherwise social structures are going to collapse (and someone else is going to take that job from you, and gone is your monopoly of violence)


Interestingly, as I've begun to realise the ease by which a State's trust can sway has actually increased my believe that this should come from 'below'. I think a trust network between people (of different countries) can be much more resilient.


I’ve also been thinking about this quite a bit lately.

I also want something like this for a lightweight social media experience. I’ve been off of the big platforms for years now, but really want a way to share life updates and photos with a group of trusted friends and family.

The more hostile the platforms become, the more viable I think something like this will become, because more and more people are frustrated and willing to put in some work to regain some control of their online experience.


The key is to completely disconnect all ad revenue. I'm skeptical people are willing to put in some money to regain control; not in the kind of percentages that means I can move most of my social graph. Network effects are a real issue.


They're different application types - friends + family relationship reinforcement, social commenting (which itself varies across various dimensions, from highlighting usefulness to unapologetically mindless entertainment), social content sharing and distribution (interest group, not necessarily personal, not specifically for profit), social marketing (buy my stuff), and political influence/opinion management.

Meta and X have glommed them all together and made them unworkable with opaque algorithmic control, to the detriment of all of them.

And then you have all of them colonised by ad tech, which distorts their operation.


Also there's the problem that every human has to have perfect opsec or you get the problem we have now, where there are massive botnets out there of compromised home computers.


GPG lost, TLS won. Both are actually webs of trust with the same underlying technology. But they have different cultures and so different shapes. GPG culture is to trust your friends and have them trust their friends. With TLS culture you trust one entity (e.g. browser) that trusts a couple dozen entities that (root certificate authorities), that either signs keys directly or can fan out to intermediate authorities that then sign keys. The hierarchical structure has proven much more successful than the decentralized one.

Frankly I don't trust my friends of friends of friends not to add thirst trap bots.


The difference is in both culture and topology.

TLS (or more accurately, the set of browser-trusted X.509 root CAs) is extremely hierarchical and all-or-nothing.

The PGP web of trust is non-hierarchical and decentralized (from an organizational point of view). That unfortunately makes it both more complex and less predictable, which I suppose is why it “lost” (not that it’s actually gone, but I personally have about one or maybe two trusted, non-expired keys left in my keyring).


The issue is key management. TLS doesn't usually require client keys. GPG requires all receivers to have a key.


Couple dozen => it’s actually 50-ish, with a mix of private and government entities located all over the world.

The fact that the Spanish mint can mint (pun!) certificates for any domain is unfortunate.

Hopefully, any abuse would be noticed quickly and rights revoked.

It would maybe have made more sense for each country’s TLD to have one or more associated CA (with the ability to delegate trust among friendly countries if desired).

https://wiki.mozilla.org/CA/Included_Certificates


Yes I never understood why the scope of a CA was not previously declared as part of their CA certificate. The purpose is (email, website etc) but not the possible domains. I'm not very happy that the countless Chinese CAs included in Firefox can sign any valid domain I use locally. They should be limited to anything .cn only.

At least they seem to have kicked out the Russian ones now. But it's weird that such an important decision lies with arbitrary companies like OS and browser developers. On some platforms (Android) it's not even possible to add to the system CA list without root (only the user one which apps can choose to ignore)


Isn't this vaguely how the invite system at Lobsters functions? There's a public invite tree, and users risk their reputation (and posting access) when they invite new users.


I know exactly zero people over there. I am also not about to go brown nose my way into it via IRC (or whatever chat they are using these days). I'd love to join, someday.


Hey I never actually tried lobsters, do you mind if I ask an invite?


I think this ideas problem might be the people part, specifically the majority type of people that will click absolutely anything for a free iPad


Theoretically that should swiftly be reflected in their trust level. But maybe I'm too optimistic.

I have nothing intrinsically against people that 'will click absolutely anything for a free iPad' but I wouldn't mind removing them from my online interactions if that also removes bots, trolls, spamners and propaganda.


Cool! I'm working on something comparable, but with the audio stored on a single SD and playback triggered using an RFID tag that we can than stick on wooden figures made by my GF (or anything 3D printed).

I'm still iterating over hardware, realising Pi Zero is a bit of overkill, using too many NiMH batteries in series may actually break those batteries, that ESP8266 has much less GPIO's available than the module design suggests, among other lessons learned.

My current approach is Pi Pico (ESP32 was the alternative) with a DfPlayer Mini and a 32GB SD card.

The DfPlayer isn't too keen on running on 3v3 from cheaper LDO's (which are on the modules I'm using) so my current approach uses a small power bank. That just offloads the hard part (for me) of battery management to the professionals. This weekend I added a few resistors and a transistor to draw extra power (0.5secs every 20 seconds) to keep the power bank awake.

But I have different LDO's and an ESP32 coming in, so it's not fully decided yet. Will for sure scan this thread and OP's article for more ideas!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: