Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's because the author miscopy-pasted the original code: those "â–‘â–’â–“â–ˆ" at the end of the O5 string are supposed to be the block characters. E.g. "â–‘" in Windows-1252 [0] is 0xE2 0x96 0xE2 which, in UTF-8, exactly the encoding for U+2592 MEDIUM SHADE [1].

[0] https://en.wikipedia.org/wiki/Windows-1252#Character_set

[1] https://www.compart.com/en/unicode/U+2592



Possible that this is the mistake.

However, I don't think I miscopied the original code.

https://reactive.network/assets/index-8b4ef4ac.js

If you look for `oahkbdpqwmZO0QLCJUYXzcvunxrjft` in the output, you should see that those characters appear exactly like that. Maybe an issue with encoding of the script file?


Most definitely; if I use "View >> Repair Text Encoding" in Firefox, it shows the block characters. But I have to admit, it's strange that Firefox does not choose UTF-8 by default in this case.


Yes, turns out I was the one who made the mistake.

I updated the article to reflect the mistake.

> Update (2024-08-29): Initially, I thought that the LLM didn’t replicate the logic accurately because the output was missing a few characters visible in the original component (e.g., ). However, a user on HN forum pointed out that it was likely a copy-paste error.

>

> Upon further investigation, I discovered that the original code contains different characters than what I pasted into ChatGPT. This appears to be an encoding issue, as I was able to get the correct characters after downloading the script. After updating the code to use the correct characters, the output is now identical to the original component.

>

> I apologize, GPT-4, for mistakenly accusing you of making mistakes.


If no character set is specified, plain text content is assumed to be 1252. This probably extends to application/javascript as well but I'd have to check to be sure.

The web pre-dates utf-8, although not by much. Ken Thompson introduced utf-8 at winter Usenix in 1993 and CERN released the web in April, but it would be several more years before utf-8 became common. The early web was ISO 8859-1 by default. But people were pretty lazy about specifying character sets back then (still are actually) and Microsoft started sending or assuming their 1252 character set where 8859-1 was required by the spec. Eventually the spec was changed to match de facto behavior. I guess the assumption was that if you're too stupid or lazy to say what character set you're using, then it's probably 1252. (Today the assumption would be that it's probably utf-8). I'm not sure what the specs say today, but I think html is assumed to be in utf-8, and everything else is assumed to be 1252 (if the character set is not explicitly declared).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: