cryptohell's comments

cryptohell · 2025-11-13T23:12:06 1763075526

Given several models, assuming only that some unknown subset is "safe", can we construct a single model as safe as that subset? This reduces obtaining a trustworthy model to a plausibly easier task.

cryptohell · on Dec 25, 2024

The classic AMS (1996) bound for estimating the frequency moments of a stream is shown to be optimal

cryptohell · on Jan 22, 2024

ChatGPT could encode usernames, timestamps and other session context into its responses in a way that would only be retrievable by OpenAI and provably invisible to everyone else

cryptohell · on June 16, 2023

How many people do you think actually bother rephrasing LLM outputs? Do you personally ever copy paste a full chunk of a response?

cryptohell · on June 2, 2023

You draw the first bit to be 0/1 with equal probability, and then the second bit must equal the previous one with probability 1

cryptohell · on June 2, 2023

True, they claim in this paper this is inevitable

cryptohell · on Feb 26, 2023

(A Turing award winner and a Godel prize winner professors at Berkeley and MIT)

cryptohell · on Feb 26, 2023

There are several differences: 1. Empirically, networks have many adversarial examples. It doesn't mean though that there are adversarial examples everywhere. They show that any point can be slightly changed to get whichever output. 2. Some training algorithms that already exist or will exist are meant to be robust. They show that even with a robust algorithm the backdoor will still exist. 3. As you said, they show that finding the backdoored point is also efficient to the key holder.