It was funny. On a more serious note, if one works in a sphere where expanding with AI makes "good enough" documents, then I have bad news for him - the sphere has too much redundancy in the first place (the same place that was used for training). So no new information is created in millions of documents made by humans, and this was noticed by the training pattern recognition. You cannot do the same with historical texts; unless we live in a simulation with predictable random generators, the events are random, and there are no rules like "If the king's name starts with a G, he will likely die in the first week of October."
> Modern LLMs suffer from hindsight contamination. GPT-5 knows how the story ends—WWI, the League's failure, the Spanish flu. This knowledge inevitably shapes responses, even when instructed to "forget.
> Our data comes from more than 20 open-source datasets of historical books and newspapers. ... We currently do not deduplicate the data. The reason is that if documents show up in multiple datasets, they also had greater circulation historically. By leaving these duplicates in the data, we expect the model will be more strongly influenced by documents of greater historical importance.
I found these claims contradictory. Many books that modern readers consider historically significant had only niche circulation at the time of publishing. A quick inquiry likely points to later works by Nietzsche and Marx's Das Kapital. They're possible subjects to the duplication likely influencing the model's responses as if they had been widely known at the time
Thanks, the last fetched page on archive.org is from 2025-01-26 [1], removed after this date and before 2025-02-13. 155,477 users at the moment, 1 star reviews were mostly about not working. It's interesting that the developers didn't care to remove the button directing to the ff add-on page at least several months after the removal. Maybe was some kind of PR compromise, they probably thought that listing it with linking to a broken page was better than not listing at all.
A review page [2] mentions that this add-on is a peer-to-peer vpn, not having its own dedicated servers that already makes it suspicious.
Not to argue, but your comment was also thought-provoking, thanks :) It seems like most works of academia are not provoking; rather, they are shaping. Many are written by specialists in the area who carefully choose what to state and suggest, and very often follow the structure of a big "thought" that is further explained and explored. Few pop books that might meet my criteria are basically digests, but fact-based ones. It's interesting that "Thinking, Fast and Slow" is a middle ground in some sense. Daniel Kahneman is definitely from academia, and in my opinion, he wrote a digest of what he touched on during his career, which was also thought-provoking for me, but not on a big scale.
Can you name some works by the mentioned authors that might be called thought-provoking digests of some area of expertise?
Not really, that's kind of my point. A lot of pop non-fiction takes a few, minor commercial ideas that could be an essay and stretches them out into a book with a lot of fluff.
Academic books will literally change the way you view the world in fundamental ways, they go beyond the digests you mention.
After reading the description, I'd say this is one of those books that interprets phenomena around us in a novel way, without claiming we should jump off "the shoulders of giants." There have been several like it in my reading history, but since I can't name them instantly, they probably weren't that thought-provoking.
Books that offer profound inspiration are truly treasures of human civilization, but nowadays, it's rare to find a physical book that makes you want to read it in one sitting, unable to put it down.
If you're talking about the competition part of "Moonwalking..." I hear you. Many would argue that the author's participation in the memory competition glues the book together and adds an entertaining angle. Personally, it sometimes feels boring when the author dedicates too much space to dialogs with memory athletes-focusing on mundane topics instead of techniques or what they learned about memory. Still, there are so many fascinating facts and references that I'm okay with it.
That book doesn't really teach the tricks of the trade (and it is not promised anyway), but it is a good introduction to the world of memory for people unaware of the potential of memory.
A semi-scary thought came while reading the post: LLMs could talk to each other without humans noticing (for example using a very complex acrostic). But not in the form of chat-to-chat, which not only is rarely used in real life but also won't likely have lasting consequences (the context will eventually be lost). I was thinking that new web content, more and more of it AI-generated, could contain hidden messages that later might be absorbed into the training data of other LLMs. Maybe this leans more toward a plot for a black comedy than a genuine concern, but who knows...
I'd say that html+js suggestion of GP still holds, but with caveats. After all these years, HTML has everything needed for this, including images that can be embedded via the data URI scheme [1].
For example, I once adjusted an Object Pascal interactive program (target: Windows/Win32) for the browser target (FreePascal compiler has the JS target). An intermediate result was a bunch of files that worked locally on desktop but struggled on mobile. With a little help from the SingleFile extension [2], I ended up with a single HTML file containing all functionality and content. It worked great, for example, in MiXplorer's internal HTML viewer. I can't recall the exact details, but the file:/// protocol still had issues in Chrome, Firefox, or both. Anyway, preparing a local address correctly with a keyboard is a challenge so let's just assume that having capable file managers running local html files is enough
Sure, to make this manageable, you need good tools that handle all sides of the task. But at least in theory, the format is fully capable. My only global issue was that the state for locally run HTML files is a kind of ephemeral entity, but for interactive multimedia files, you may consider this obstacle small.
In essence you're describing epub, which is HTML, and I agree. It has great potential but nobody seems to see it as more than a cheap ebook format, and even that is underdeveloped in terms of capabilities: presentation quality and annotation are nowhere near PDF, for example.
Most of all it needs usable editors, and editors which integrate multimedia and dynamic content editing. End users can't turn to a different editor for each media and then integrate the output into the epub document, like a web developer does (e.g., for an image use Photoshop, save the jpg, copy to the proper directory, reference appropriately in the html).
Maybe the "rewarding the young" in the top comment is from the genes of savanna humans when they collected fruit, hunted and didn't care about expensive medical procedures because the latter simply didn't exist?
Perhaps. Genetics doesn't reward rationality, empathy, suffering reduction desire and self awareness, etc, only biological line go up and reproduction fitness. A bug to patch.
> The people paying for everything would get to make the decisions.
Just as a thought experiment: what if the threshold for having a vote was tied to paying a positive amount of personal income tax, and the weight of each vote was proportional to the amount paid? How skewed might such a system be? My first reaction is that in countries with high inequality, the wealthy would disproportionately influence the outcome. However, on the other hand, if people avoid or minimize paying taxes, they would lose the power of a weighted vote, which theoretically could incentivize paying taxes in full.
I think tying it to pension age would make a very interesting dynamic. I can see a lot of incentives and alignments around the issue would instantly shift around, and I think for the healthier and will incentivize finding the true ideal pension age.
reply