Hacker Newsnew | past | comments | ask | show | jobs | submit | cryptohell's commentslogin

Given several models, assuming only that some unknown subset is "safe", can we construct a single model as safe as that subset? This reduces obtaining a trustworthy model to a plausibly easier task.


The classic AMS (1996) bound for estimating the frequency moments of a stream is shown to be optimal


ChatGPT could encode usernames, timestamps and other session context into its responses in a way that would only be retrievable by OpenAI and provably invisible to everyone else


How many people do you think actually bother rephrasing LLM outputs? Do you personally ever copy paste a full chunk of a response?


You draw the first bit to be 0/1 with equal probability, and then the second bit must equal the previous one with probability 1


True, they claim in this paper this is inevitable


(A Turing award winner and a Godel prize winner professors at Berkeley and MIT)


There are several differences: 1. Empirically, networks have many adversarial examples. It doesn't mean though that there are adversarial examples everywhere. They show that any point can be slightly changed to get whichever output. 2. Some training algorithms that already exist or will exist are meant to be robust. They show that even with a robust algorithm the backdoor will still exist. 3. As you said, they show that finding the backdoored point is also efficient to the key holder.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: