Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Was there ever a writeup of exactly how the XZ exploit worked? I mean exactly, I get the general overview and even quite a few of the specifics, but last time I checked no one had credibly figured out exactly how all the obfuscated components went together.


Gynvael Coldwind made a great analysis about it: https://gynvael.coldwind.pl/?lang=en&id=782

https://news.ycombinator.com/item?id=39878681

xz/liblzma: Bash-stage Obfuscation Explained


That is, as it says in the title, about the Bash-stage obfuscation. That’s fun but it’d also be interesting to know what capabilities the exploit payload actually provided to the attacker. Last I looked into that a month or so ago there were at least two separate endpoints already discovered, and the investigation was still in progress.


The social-engineering aspect of pressuring the old maintainer is way more interesting than the actual software IMHO https://securelist.com/xz-backdoor-story-part-2-social-engin...


I already got all that. Yes, I think it's interesting, but I wanted to see a final (non-interim) analysis of the payload going byte-by-byte.


I agree 1000% with this. One thing I don't see addressed in the article you reference, though, is whether any OpenSSH maintainers spotted the addition of a co-maintainer to xz utils and did any due diligence about it.


Seems unlikely. xz is not a dependency of OpenSSH.

It's only a transitive dependency of sshd on Linux distributions that patch OpenSSH to include libsystemd which depends on xz.

It's wholy unreasonable to expect OpenSSH maintainers to vet contributors of transitive dependencies added by distribution patches that the OpenSSH maintainers clearly don't support.


> It's only a transitive dependency of sshd on Linux distributions that patch OpenSSH to include libsystemd which depends on xz.

Ah, ok. Then my question should really be about the distros--did any of them spot the co-maintainer being added and do due diligence?

As for the "libsystemd" part, there's another reason for me to migrate to non-systemd distros.


From https://news.ycombinator.com/item?id=39866275 by rwmj:

> Very annoying - the apparent author of the backdoor was in communication with me over several weeks trying to get xz 5.6.x added to Fedora 40 & 41 because of it's "great new features". We even worked with him to fix the valgrind issue (which it turns out now was caused by the backdoor he had added). We had to race last night to fix the problem after an inadvertent break of the embargo.

> He has been part of the xz project for 2 years, adding all sorts of binary test files, and to be honest with this level of sophistication I would be suspicious of even older versions of xz until proven otherwise.


Don’t conflate this with the ongoing trendy systemd hate. There are myriad other attack vectors out there.



Yeah, what's posted by you and other users so far is stuff I know, build scripts, injection, obfuscation. I'm more looking for a careful reverse engineering of the actual payload.


https://www.youtube.com/watch?v=Q6ovtLdSbEA

This talk by Denzel Farmer at Columbia isn't a complete disassembly of the payload but it's the best I've seen so far.

Slides if you don't want to watch the video: https://cs4157.github.io/www/2024-1/lect/21-xz-utils.pdf


Thanks for posting that. A quick perusal of those slides looks good. I know what I'm going to be reading and watching this evening!


I haven't looked again in months, but I'd be interested in the same thing you're looking for. I poked at the payload with Ghidra for a little bit, realized it was miles above my pay grade, and stepped away. Everybody was wowed by the method of delivery but the payload itself seems to have proved fairly inscrutable.


I'd also like to see the timeline of XZ's landlock implementation, I haven't seen that discussed much.


https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78b...

The link you want from that is this https://bsky.app/profile/filippo.abyssdomain.expert/post/3ko... ; that set of tweets has the high level overview.

They in turn links to https://github.com/amlweems/xzbot which has more details.

The TL;DR is that is hooks the RSA bits to look for an RSA cert with a public key that isn't really an RSA public key; the pubkey material contains a signed & encrypted request from the attacker, signed & encrypted with an ed448 key. If the signature checks out, system() is called, i.e., RCEaaS for the attacker.


Random aside to the other commenter's linked articles, I find it a bit coincidental that the supposed "kill switch" environment variable, yolAbejyiejuvnup=Evjtgvsh5okmkAv, decodes from UTF-16LE to UTF-8 as 潹䅬敢祪敩番湶灵䔽橶杴獶㕨歯歭癁 which google translates to "You can't do it without a soul."


Any even-length alphabetic ASCII string decodes to random Chinese characters in UTF-16LE. Digits and = unlock some Japanese hiragana, Korean hangeul and assorted punctuation, but those only make up a small fraction of the total.

For example, 'backdoor'.encode('ascii').decode('utf_16_le') == '慢正潤牯', which Google Translate turns into "Slow and positive", but it's just nonsense.


I'm naive to the translation tech space but is this sort of thing unique to languages like Chinese? I figured all this stuff was mostly solved. Like I wouldn't expect dflhglsdhfgalskjdf to have Google Translate output some grammatically valid Spanish output.


There is one difference between gibberish Chinese and Latin character sequences. In Chinese, each character indeed carry some meanings (like a word). So I guess the model may hallucinate some output inspired by these meanings. In the case "慢正潤牯" -> "Slow and positive", it actually translated the first two characters literally (慢 -> slow, 正 -> correct/positive/upright).

So equivalent English gibberish would be like "hast prank bibble done anut me me ions." Google translates this one to "对我而言,恶作剧已经完成了。" (To me, the prank has been done.) in Chinese -- very valid sentence, and "¿Me has hecho una broma a mí, Bibble?" in Spanish -- also seems valid.

I guess the model is (over) optimized to generate valid outputs. This can be a feature, so it still translates grammatically invalid but to some degree understandable text (like with typos or non-standard Internet language).


I think the Latin script might be somewhat protected because random jumbles of letters do appear as serial numbers and whatnot, but for other scripts, anything goes.

I say ҏӲҨЏ ҜъКѠ ЇЩіН гӞэѷ in "Russian", Google Translate says "Let's talk about it".


Amazing. How did you find it out?


I hadn't looked into that story before so was following the rabbit hole of articles and gists and stuff and saw that some referenced a kill switch via env variable, so I just tossed it into that CyberChef online tool using its "magic mode" and ticked the "intensive mode" box and it was the top result. Just commented because I hadn't seen it elsewhere and figure it might be a little easter egg of sorts.


It seems that Google Translate simply output garbage when you input garbage. So it should be indeed a coincidence to get this translation.


Wow I didn't realize what implicit trust I put in their translation output. Indeed I just tried some other Chinese -> English translation sites and they vary widely on what they output. Is it gibberish chinese characters these translators just guess on? Either way thanks for the insight I clearly put too much assumed faith in their quality/accuracy.


Right, completely gibberish. as a native speaker, I can recognize at most 4 characters, and not even one subsequence makes any sense.

Actually just by shuffling these characters you have a good chance to get some specious translations (adding a punctuation makes it more likely to generate a completed sentence): "祪癁番䔽䔽!" -> "I am so sick!" "獶獶祪灵癁癁癁!" -> "The soul is full of blood!"


I think this is sufficiently detailed?

https://lwn.net/Articles/967192/

But if there's a part that's still unclear, maybe there's another writeup somewhere that addresses it?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: