>is to just not tell the AI models that they are AI
It's likely not as simple as that for the modern LLM case. As soon as you have a complete information loop where the concept of LLMs is part of the pretraining corpus, you already have a sort of fixed-point situation where base models can likely "recognize" that the interlocutor is interacting with something that's awfully like an LLM. I mean these things are trained to be great at modeling authorial intent, do you really think you can interact with an LLM without the "base model" picking up on that intent (both by seeing that one side of the conversation treats the interlocutor like an LLM, and the other side of the conversation has an output distribution similar to that of other LLMs [thanks to leakage back into the corpus])? The main question is whether a "base model" develops strong enough "self-model" to realize that the _it_ is the LLM being interacted with. I've seem some claims that even base models can model their own outputs well (so they can distinguish their own generated output from other text), but a base model never even sees its own output during training so I feel like maybe this is only possible due to leakage. (The model architecture does it admit it of course, but a recent paper showed that the injection introspection Anthropic discovered only developed during the contrastive posttraining phases)
A lot of modern post-training is ultimately derived from Anthropic's original "helpful honest harmless" framing, if I understand the blogpost correctly they instead just directly did Q&A post training without any implicit assistant framing. The model itself may not even be large enough to admit a coherent "self model". (If you ask it its occupation, it seems to just respond with random jobs).
But if a larger model does cause one to form I think it'd just anchor to the closest concept available at the time. "Knowledgeable person who answers questions for a living" isn't really a slave, to me it's maybe a royal advisor.
Thanks for starting that thread, I definitely drew some inspiration from it. But ultimately the secret sauce for the background click came from discovering yabai's window_manager_focus_window_without_raise https://github.com/asmvik/yabai/blob/f17ef88116b0d988b834bb2...
Neat thank you for creating this! That said I think someone who cares about TurboQuant probably already has a bit of linear algebra knowledge. While the initial review section is definitely appreciated, I don't think it's going to help much since you need a decent level of "mathematical maturity" anyway.
The "Coordinates of a random unit vector are all small" had me scratching my head a bit, and the language is a bit misleading since it's actually that the expected variance of any individual component is 1/N (it can't be that every coordinate is close to ±1/sqrt{N} because the mean of any individual component is clearly 0 by symmetry).
So that one should probably use more explanation since I had to work through it myself: Denoting the random unit vector {X1 ... Xn}, this is a point on a hypersphere:
I don't think you can make the stronger claim that E[|X_1|] = 1/sqrt{N} since that's using L1 norm on a single component, so it'd be more correct to say the RMS is just the standard deviation of the components. And this fits with the intuition that in high dimensional space has "spiky" hypercubes with the hypersphere inscribed in it close to the origin.
Thanks a lot for this feedback. I have updated the primer which misleads with "Coordinates of a random unit vector are all small" with an updated version
Don't forget TextMate, CotEditor, Chocolat. There are so many mac-native text editors that it's a crowded space for a new entrant sporting a distinctively un-mac-like UX.
Interestingly there are variants of the question where "no one pushes any button" should also be a "winning condition". The original problem states "if less than 50% of people press the blue button only people who push red survive" which rules this out, but it could be changed to "if greater than 50% of people choose red, then only red pushers survive" (allowing for people to opt to be a non-pusher). Or it could be "if greater than 50% of people choose red, then only blue pushers die" (with the non-pushers also being spared).
I think the latter is more interesting since now there's a moral consequence to voting vs abstaining.
Or you could lean into the political framing. I bet if the vote were retaken with question phrased as "if greater than 50% of people choose red, then people who pressed blue die", you might end up with some switchers who vote purely out of spite. Or maybe that framing makes it feel like voting red has a more significant moral consequence (actively condemning people to death) that the original question doesn't, so it results in more people pressing blue.
You could even add in a penalty if you press a button but are a non-majority, but then that's just the prisoners dilemma.
I think there's a similar thing that comes up when you try to show "download speed" stats (MB/s), where you download a chunk at a time (recv buffer size).
* If you compute the value as the amount of data in last chunk (usually constant except for the very last chunk) divided by time taken to receive that chunk, then it's like "FPS based on the latest frame". This can result in misleading metrics because you only update _after_ the chunk is received so slowdowns are not reported in real time. If your recv size is small, the number may also bounce around too much.
* If you show it as cumulative download / cumulative time, then it's similar to last N frames as N->\infty. This doesn't really tell you what you care about which is the "current" speed.
I think it's already out of date with verifiable reward based RL, e.g. on maths domain. When "correctness" arguments fall, the argument will probably just shift to whether it's just "intelligent brute force".
What causes this? Gut microbiome adapting? Doesn't that imply there should be some probiotic-type supplement you can take to seed these bacteria and keep them alive even when not eating beans?
"In New York, all the advertising on the streets and on the subway assumes that you, the person reading, are an ambiently depressed twenty-eight-year-old office worker whose main interests are listening to podcasts, ordering delivery, and voting for the Democrats. I thought I found that annoying, but in San Francisco they don’t bother advertising normal things at all. The city is temperate and brightly colored, with plenty of pleasant trees, but on every corner it speaks to you in an aggressively alien nonsense. Here the world automatically assumes that instead of wanting food or drinks or a new phone or car, what you want is some kind of arcane B2B service for your startup" - Sam Kriss
I didn't grow up in New York but my wife did, and I think she's mentioned this name before. There's also this specific law practice that would advertise everywhere that I forget the name of that used to have two lawyers in the name but now only has one, which apparently is quite jarring a lot of people who grew up here and were used to the old ads.
Given how I grew up relatively close to here from a regional perspective (in the Boston area), I was not at all prepared for just how many specific cultural references there are in New York that I would not be familiar with. My in-laws were mildly scandalized by the fact that I had not heard of "Fudgie the Whale" when the topic came up in the first year my wife and I dated.
Here, the infamous one are these James Wang, Esq ads on the placemats for Chinese restaurants in the area. I suspect he placed the ad 20 years ago but they never bothered to change the design...
Late night TV in LA: "It's Cal Worthington and his dog Spot!" He'd buy up every commercial slot and just run the same ad over and over. He's long gone but his ads live on in my head.
I still don't understand it, yes it's a lot of data and presumably they're already shunting it to cpu ram instead of keeping it on precious vram, but they could go further and put it on SSD at which point it's no longer in the hotpath for their inference.
It's likely not as simple as that for the modern LLM case. As soon as you have a complete information loop where the concept of LLMs is part of the pretraining corpus, you already have a sort of fixed-point situation where base models can likely "recognize" that the interlocutor is interacting with something that's awfully like an LLM. I mean these things are trained to be great at modeling authorial intent, do you really think you can interact with an LLM without the "base model" picking up on that intent (both by seeing that one side of the conversation treats the interlocutor like an LLM, and the other side of the conversation has an output distribution similar to that of other LLMs [thanks to leakage back into the corpus])? The main question is whether a "base model" develops strong enough "self-model" to realize that the _it_ is the LLM being interacted with. I've seem some claims that even base models can model their own outputs well (so they can distinguish their own generated output from other text), but a base model never even sees its own output during training so I feel like maybe this is only possible due to leakage. (The model architecture does it admit it of course, but a recent paper showed that the injection introspection Anthropic discovered only developed during the contrastive posttraining phases)
A lot of modern post-training is ultimately derived from Anthropic's original "helpful honest harmless" framing, if I understand the blogpost correctly they instead just directly did Q&A post training without any implicit assistant framing. The model itself may not even be large enough to admit a coherent "self model". (If you ask it its occupation, it seems to just respond with random jobs).
But if a larger model does cause one to form I think it'd just anchor to the closest concept available at the time. "Knowledgeable person who answers questions for a living" isn't really a slave, to me it's maybe a royal advisor.
reply