Was there ever a writeup of exactly how the XZ exploit worked? I mean exactly, I get the general overview and even quite a few of the specifics, but last time I checked no one had credibly figured out exactly how all the obfuscated components went together.
That is, as it says in the title, about the Bash-stage obfuscation. That’s fun but it’d also be interesting to know what capabilities the exploit payload actually provided to the attacker. Last I looked into that a month or so ago there were at least two separate endpoints already discovered, and the investigation was still in progress.
I agree 1000% with this. One thing I don't see addressed in the article you reference, though, is whether any OpenSSH maintainers spotted the addition of a co-maintainer to xz utils and did any due diligence about it.
Seems unlikely. xz is not a dependency of OpenSSH.
It's only a transitive dependency of sshd on Linux distributions that patch OpenSSH to include libsystemd which depends on xz.
It's wholy unreasonable to expect OpenSSH maintainers to vet contributors of transitive dependencies added by distribution patches that the OpenSSH maintainers clearly don't support.
> Very annoying - the apparent author of the backdoor was in communication with me over several weeks trying to get xz 5.6.x added to Fedora 40 & 41 because of it's "great new features". We even worked with him to fix the valgrind issue (which it turns out now was caused by the backdoor he had added). We had to race last night to fix the problem after an inadvertent break of the embargo.
> He has been part of the xz project for 2 years, adding all sorts of binary test files, and to be honest with this level of sophistication I would be suspicious of even older versions of xz until proven otherwise.
Yeah, what's posted by you and other users so far is stuff I know, build scripts, injection, obfuscation. I'm more looking for a careful reverse engineering of the actual payload.
I haven't looked again in months, but I'd be interested in the same thing you're looking for. I poked at the payload with Ghidra for a little bit, realized it was miles above my pay grade, and stepped away. Everybody was wowed by the method of delivery but the payload itself seems to have proved fairly inscrutable.
The TL;DR is that is hooks the RSA bits to look for an RSA cert with a public key that isn't really an RSA public key; the pubkey material contains a signed & encrypted request from the attacker, signed & encrypted with an ed448 key. If the signature checks out, system() is called, i.e., RCEaaS for the attacker.
Random aside to the other commenter's linked articles, I find it a bit coincidental that the supposed "kill switch" environment variable, yolAbejyiejuvnup=Evjtgvsh5okmkAv, decodes from UTF-16LE to UTF-8 as 潹䅬敢祪敩番湶灵䔽橶杴獶㕨歯歭癁 which google translates to "You can't do it without a soul."
Any even-length alphabetic ASCII string decodes to random Chinese characters in UTF-16LE. Digits and = unlock some Japanese hiragana, Korean hangeul and assorted punctuation, but those only make up a small fraction of the total.
For example, 'backdoor'.encode('ascii').decode('utf_16_le') == '慢正潤牯', which Google Translate turns into "Slow and positive", but it's just nonsense.
I'm naive to the translation tech space but is this sort of thing unique to languages like Chinese? I figured all this stuff was mostly solved. Like I wouldn't expect dflhglsdhfgalskjdf to have Google Translate output some grammatically valid Spanish output.
There is one difference between gibberish Chinese and Latin character sequences. In Chinese, each character indeed carry some meanings (like a word). So I guess the model may hallucinate some output inspired by these meanings. In the case "慢正潤牯" -> "Slow and positive", it actually translated the first two characters literally (慢 -> slow, 正 -> correct/positive/upright).
So equivalent English gibberish would be like "hast prank bibble done anut me me ions." Google translates this one to "对我而言,恶作剧已经完成了。" (To me, the prank has been done.) in Chinese -- very valid sentence, and "¿Me has hecho una broma a mí, Bibble?" in Spanish -- also seems valid.
I guess the model is (over) optimized to generate valid outputs. This can be a feature, so it still translates grammatically invalid but to some degree understandable text (like with typos or non-standard Internet language).
I think the Latin script might be somewhat protected because random jumbles of letters do appear as serial numbers and whatnot, but for other scripts, anything goes.
I say ҏӲҨЏ ҜъКѠ ЇЩіН гӞэѷ in "Russian", Google Translate says "Let's talk about it".
I hadn't looked into that story before so was following the rabbit hole of articles and gists and stuff and saw that some referenced a kill switch via env variable, so I just tossed it into that CyberChef online tool using its "magic mode" and ticked the "intensive mode" box and it was the top result. Just commented because I hadn't seen it elsewhere and figure it might be a little easter egg of sorts.
Wow I didn't realize what implicit trust I put in their translation output. Indeed I just tried some other Chinese -> English translation sites and they vary widely on what they output. Is it gibberish chinese characters these translators just guess on? Either way thanks for the insight I clearly put too much assumed faith in their quality/accuracy.
Right, completely gibberish. as a native speaker, I can recognize at most 4 characters, and not even one subsequence makes any sense.
Actually just by shuffling these characters you have a good chance to get some specious translations (adding a punctuation makes it more likely to generate a completed sentence):
"祪癁番䔽䔽!" -> "I am so sick!"
"獶獶祪灵癁癁癁!" -> "The soul is full of blood!"