More

latentnumber · 2025-09-25T18:45:36 1758825936

"the" is redundant is probably what GP means.

latentnumber · 2025-07-14T20:29:17 1752524957

latentnumber · 2025-06-28T21:39:21 1751146761

latentnumber · 2025-05-28T13:06:10 1748437570

I would agree with this if the LLM never really modified the initial linear embeddings, but non-linearity in MLP layers and position/correlation fixing in the attention layers would mean that things are not so simple. I’m pretty sure there are papers showing compositionality and so on being represented by transformers.

latentnumber · on Aug 26, 2024

Absolutely! Everything I design, I basically cast it as ER. I'm not even a backend developer but this is such a useful and universal framing (for me, at least, or so I thought) that I always used to wonder what the "real" backend engineers do.

Jensson · on Aug 26, 2024

> I always used to wonder what the "real" backend engineers do.

Think ChatGPT, people aren't there to do CRUD, they are there to talk to the model. Saving the results and what they write is a useful feature so CRUD is still needed, but not the main attraction.

Or any other kind of server that computes something, like a game server, as long as you want to write a server program instead of just storing data.

dspillett · on Aug 26, 2024

> Everything I design, I basically cast it as ER … wonder what the "real" backend engineers do.

I for one spend a fair amount of time refactoring reports and other complex requests that ER & friends generate absolute monsters for, because the data model is optimised for just plonking data in and pulling it back a few objects at a time, with no thought to larger access patterns that are going to be needed by the users later. Sometimes that means just improving the access SQL (replacing the generated mess with something more hand crafted), sometimes it is as simple as improving the data's indexing strategy, someone the whole schema needs a jolly good visit from the panel beater.

A particular specialty is turning report/dashboard (both system-wide and per-user) computations that have no business taking so long over _that_ size of data, from "takes so long we'll have to precompute overnight and display from cache" to hundreds of even tens of milliseconds, otherwise known as "wow that is quick enough to run live!".

This is exacerbated by devs working on local on-prem SQL Server instances, with dedicated many-core CPUs and blinding fast SSDs, initially, then the product being run in production on AzureSQL where for cost reasons several DBs are crammed into a standard-class elastic pool where CPU & memory are more limited and IO for anything not already in the buffer pool is several orders of magnitude slower than local (think "an elderly asthmatic arthritis-riddled ant librarian fetching you an encyclopaedia from back store" slow).

The other big "oh, it worked well in dev" cause is that even when people dev/test against something akin to the final production infrastructure, they do that testing with an amount of data that some clients will generate in days, hours, or even every few minutes (and that is ignoring the amount that will just arrive in one go as part of an initial on-boarding migration for some clients).

Glorified-Predictive-Text generated EF code is not currently helping any of this.

</rant> :-)

latentnumber · on Aug 25, 2024

Why not automate verification itself then? While not possible now, and I would probably never advocate for using LLMs in critical settings, it might be possible to build field-specific verification systems for LLMs with robustness guarantees as well.

RodgerTheGreat · on Aug 25, 2024

If the verification systems for LLMs are built out of LLMs, you haven't addressed the problem at all, just hand-waved a homunculus that itself requires verification.

If the verification systems for LLMs are not built out of LLMs and they're somehow more robust than LLMs at human-language problem solving and analysis, then you should be using the technology the verification system uses instead of LLMs in the first place!

wbogusz · on Aug 25, 2024

> If the verification systems for LLMs are not built out of LLMs and they're somehow more robust than LLMs at human-language problem solving and analysis, then you should be using the technology the verification system uses instead of LLMs in the first place!

The issue is not in the verification system, but in putting quantifiable bounds on your answer set. If I ask an LLM to multiply large numbers together I can also very easily verify the generated answer by topping it with a deterministic function.

I.e. rather than hoping that an LLM can accurately multiply two 10 digit numbers, I have a much easier (and verified) solution by instead asking it to perform this calculation using python and reading me the output

sickblastoise · on Aug 25, 2024

Spitballing, if you had a digital model of a commercial airplane, you could have an llm write all of the component code for the flight system, then iteratively test the digital model under all possible real world circumstances.

I think automating verification generally might require general intelligence, not an expert though.

latentnumber · on Aug 17, 2024

When do you choose to moderate? Why is moderation an assumption for maintenance if you are maximizing over an individual anyway?

latentnumber · on Aug 17, 2024

But what you're assuming is a static starting position. The entire point of liberalism is, if you read the essay, to be fair. It is absurdly unfair to that one person and I'm pretty sure they'll never be able to advance in this society. This is not just about that one person though, it's the perception that society does not really care about an individual and that individual can be you. However, fairness is the starting position in Junkland and the argument is that this begets progress while a society like Omelas shows a complete disregard for fairness. This was the spirit of the original position.

latentnumber · on Aug 17, 2024

> It seems like this would be a very useful starting point for LLM quality engineering, at least for simple inference.

Interesting. Can you elaborate on this? You mean this test can function as a metric or is it just an evaluation for applications?