More

mikemike · on Nov 13, 2024

A good read if you want to learn (more than you ever wanted) about stack frame unwinding in conjunction with a JIT compiler.

The only correction I have: LuaJIT _does_ have 64 bit integers, e.g. 0x0123456789abcdefLL.

gnurizen · on Nov 13, 2024

Thanks for the kind words Mike, will correct ASAP! Kinda surprised that’s the only thing I got wrong, thanks for taking the time to read it.

mikemike · on Aug 27, 2023

This is conjecture. OTOH I measured while I designed the LuaJIT IR.

1. An array index is just as suitable as a pointer for dereferencing. 2. What matters is how many dereferences are needed and their locality. 3. Data structure density is important to get high cache utilization.

References show a lot of locality: 40% of all IR operands reference the previous node. 70% reference the previous 10 nodes. A linear IR is the best cache-optimized data structure for this.

That said, dereferencing of an operand happens less often than one might think. Most of the time, one really needs the operand index itself, e.g. for hashes or comparisons. Again, indexes have many advantages over pointers here.

What paid off the most was to use a fixed size IR instruction format (only 64 bit!) with 2 operands and 16 bit indexes. The restriction to 2 operands is actually beneficial, since it helps with commoning and makes you think about IR design. The 16 bit index range is not a limitation in practice (split IR chunks, if you need to). The high orthogonality of the IR avoids many iterations and unpredictable branches in the compiler itself.

The 16 bit indexes also enable the use of tagged references in the compiler code (not in the IR). The tag caches node properties: type, flags, constness. This avoids even more dereferences. LuaJIT uses this in the front pipeline for fast type checks and on-the-fly folding.

the-smug-one · on Aug 28, 2023

Did you happen to do any long form prose writing (paper, online article, etc) about the design of the LuaJIT IR? Sounds extremely fascinating.

mikemike · on Oct 5, 2022

Uh? This *is* the LuaJIT register allocator. Period.

Code published 2009. Description published here: https://lua-users.org/lists/lua-l/2009-11/msg00089.html (ignore the TLS cert error).

Coming up with a silly markting name, writing a naive implementation and then claiming it's their invention is impertinent. Especially since they mention LuaJIT itself in the text ...

mkeeter · on Oct 5, 2022

Hi Mike,

I promise that this was a genuine case of parallel evolution – I didn't read any LuaJIT source or documentation while writing the code or article.

Afterwards, I searched for "reverse linear scan register allocation", discovered the blog post which referenced LuaJIT, and linked it in the text.

I've updated the article to move the LuaJIT reference to the top, and adjusted the phrasing to be clear that this is not an invention, it's simply an interesting implementation. What you call "naive", others may consider didactic :)

Let me know if you have other feedback or suggested changes, either here or via email (in my profile).

mikemike · on Oct 5, 2022

You may want to clarify that in the GitHub repo, too. See my issue there.

If you want to go the didactic route, then consider documenting the improvements over the naive implementation: register hinting, register priorities (PHI), two-headed register picking, fixed register picking, optimized register picking for 2-operand instructions (x86/x64), register pair picking, ABI calling-conventions, weak allocations, cost heuristics, eviction heuristics, lazy/eager spill/restore, rematerialization, register shuffling (PHI) with cycle breaking, register renaming, etc. That's all in ~2000 lines of lj_asm.c.

mkeeter · on Oct 5, 2022

Done, I've added the same disclaimer to the Github repo.

Given that your Github issue [1] was originally titled "Take Down Notice", I'm now hesitant to read any LuaJIT code.

If I read lj_asm.c and learn from it, will you try to take down any register allocation code that I write in the future?

[1] https://github.com/mkeeter/ssra/issues/1

mikemike · on Oct 5, 2022

I had already changed the title after your reply. The objection is about the naming, which implies an invention claim without further explanation. It's not about the code.

fros1y · on Oct 5, 2022

Wait,

Are you demanding that independent implementations of an idea you explicitly dedicated to the public domain must then describe the various ways your version is “superior”?

Who hurt you?

rrss · on Oct 5, 2022

This is such an amazingly uncharitable response.

> I cannot possibly know all of the literature

Maybe this applies to other people too?

mikemike · on March 9, 2022

That's what I'm wondering, too, right now.

It's trivial to DoS-hang redis with the script feature (and SCRIPT KILL won't help).

And I found at least 3 DoS-crash, because it hasn't backported fixes to its copy of Lua 5.1.5 (but Debian's liblua 5.1 might -- I haven't checked).

And that's without even exploring the really problematic builtins it still has available.

Maybe they should instead clarify their security guarantee for redis scripting (e.g. "none").

mikemike · on March 9, 2022

Yes, of course it's vulnerable, verified with Docker debian:sid. That was my first reaction when I read this, but I wanted to verify it first. You beat me with this post.

Since you've already let the cat out of the hat (which is not ideal), please file the bugs at Debian and Ubuntu.

Test command:

    redis-cli eval 'return select(2, loadstring("\027")):match("binary") and "VULNERABLE" or "OK"' 0

While we're at it, redis has ignored the advice at: http://lua-users.org/wiki/SandBoxes Almost all of the critical functions (loadstring, load, getmetatable, getfenv, ...) are present and unprotected in the redis 'SandBox' (which isn't).

Which means, disable scripting or shut down your redis instances NOW, which do not run with the same privileges as any client which has access to this. Scripting can be disabled by renaming the EVAL and EVALSHA commands to unguessable names.

reginaldo · on March 9, 2022

I reported the "loadstring accepts arbitrary bytecode" issue back to Debian and Ubuntu.

But, just to clarify, mikemike's advice is that everyone (i.e. including people running upstream redis) should disable scripting.

throwaway81523 · on March 9, 2022

Scripting in redis is important since it is how you get transactions.

tedunangst · on March 9, 2022

I don't think getmetatable et al are really problems, are they? You can mess things up in the sandbox, but that's not escaping it. I think that page is trying to build a sandbox where even a lua script can eval other lua code without blowback, but that's not what redis is trying to achieve.

josephcsible · on March 9, 2022

While you're right about the binary loadstring issue, that lua-users page is way overly paranoid. The best Lua sandboxing implementation I know of is the one Wikipedia uses, and it allows a lot of what's "unsafe" there.

adamc · on March 9, 2022

Not really usually to call something "overly paranoid" without going into why you think their evaluation is wrong.

tedunangst · on March 9, 2022

The page is trying to build a sandbox where a lua script can eval other untrusted lua code within the same lua execution environment. Many, even most?, people are only interested in isolating the host application from the lua environment.

josephcsible · on March 10, 2022

Because there's ways to build safe sandboxes that allow untrusted code to use a bunch of things that it calls unsafe.

mikemike · on June 3, 2021

One year ago I hardened LuaJIT's VM against these kind of attacks. Since then, there has been a constant influx of complaints and issues filed. All bitterly complaining their code, which mistakenly assumed a fixed hash table iteration order, is now broken.

Even when told that the Lua manual clearly states the undefined order since 20 years, they do not cease to complain. They do not realize this change helped them to discover a serious bug in their code (the order could differ even before that change). Sigh.

You can now have a guess, what one of the lesser enlightened forks of LuaJIT did ...

masklinn · on June 3, 2021

I’m not surprised. The same issue occurred in python.

And to be fair it’s a pain in the ass to debug and find out why something happens to implicitly depend on iteration order (float stability is common but not alone). And their code did work beforehand, for most values of work.

The biggest pain in the ass is that — at least in python - while you can set the hash seed explicitely if you don’t the langage doesn’t tell you. This makes reproducing the issue very annoying when only some seeds trigger it.

> the order could differ even before that change

While the order could differ I assume it was deterministic and nothing influencing those bits had changed in a while.

Rendello · on June 3, 2021

> The same issue occurred in python.

And now, as predicted by core developer Raymond Hettinger in his Modern Dictionaries talk, Python's dicts are now guaranteed to be ordered by default (as of 3.7).

randomswede · on June 4, 2021

That makes me sad.

xsmasher · on June 4, 2021

Curious why?

Clearly this is a landmine that many people step on. Why not remove it?

asdfasgasdgasdg · on June 5, 2021

To the extent this creates a performance penalty, it's a little annoying that a few systems create a behavior dependency that is not truly needed in a core type when there is a separate type that implements the desired semantics. But then again if it doesn't make it slower it should be fine.

Rendello · on June 5, 2021

Fwiw, the way Raymond re-implemented dictionaries made them more efficient (the algorithms are the same, but it's now much more cache friendly), and had the side effect of making them ordered. He and many others advocated for having the ordering guaranteed going forward.

randomswede · on June 8, 2021

And now they're locked into an implementation that is order-preserving, even if an optimization tomorrow could eke out another 10%, if ordering was not guaranteed.

BlueTemplar · on June 3, 2021

Reminds me how Factorio devs modified Lua 5.2 to actually guarantee a fixed table iteration order so as to insure determinism :

https://lua-api.factorio.com/latest/Libraries.html

mikemike · on June 1, 2017

Actually, LuaJIT 1.x is just that: a translator from a register-based bytecode to machine code using templates (small assembler snippets) with fixed register assignment. There's only a little bit more magic to that, like template variants depending on the inferred type etc.

You can compare the performance of LuaJIT 1.x and 2.0 yourself on the benchmark page (for x86). The LuaJIT 1.x JIT-compiled code is only slightly faster than the heavily tuned LuaJIT 2.x VM plus the 2.x interpreter written in assembly language by hand. Sometimes the 2.x interpreter even beats the 1.x compiler.

A lot of this is due to the better design of the 2.x VM (object layout, stack layout, calling conventions, builtins etc.). But from the perspective of the CPU, a heavily optimized interpreter does not look that different from simplistic, template-generated code. The interpreter dispatch overhead can be moved to independent dependency-chains by the CPU, if you're doing this right.

Of course, the LuaJIT 2.x JIT compiler handily beats both the 2.x interpreter and the 1.x compiler.

nkurz · on June 1, 2017

HN is an astonishing thing!

Article: "We can also refute Bernstein’s argument from first principles: the kind of people who can effectively hand-optimize code are expensive and not incredibly plentiful."

Commenter: "IMO he couldn't give a convincing answer to the guy who asked about LuaJIT author being out of a job."

Guy in audience: "I was that guy in the audience."

LuaJIT author: "Actually, LuaJIT 1.x is just that"

Voice in my head: "Aspen 20, I show you at one thousand eight hundred and forty-two knots, across the ground."

Meta: Apologies for the abstract response, but I couldn't figure out a better way to present the parallel. It can be hard to explain artistic allusions without ruining them. What I mean to say is that this pattern of responses reminded me in a delightful way of the classic story of the SR-71 ground speed check: http://www.econrates.com/reality/schul.html

samth · on June 6, 2017

I'm impressed (as usual with your work) that you're able to get that level of performance from an interpreter, although as you note it's not an apples-to-apples comparison. I wonder what you'd get from the 2.x VM design with a templating JIT compiler.

But, as you said, my main point is your last sentence -- optimizing compilers like LuaJIT 2.x are impressive and necessary.

mikemike · on Dec 3, 2016

Notable recent use of DynASM: Zend is using it to write a JIT compiler for PHP 8.0. http://externals.io/thread/268#email-12706-body

mikemike · on Sept 26, 2016

No. You DO need a good understanding of a computer language and of JIT compilers to understand the code base for any just-in-time compiler for that computer language.

LuaJIT is not a toy compiler from a textbook. There's a lot of inherent complexity in a production compiler that employs advanced optimizations and needs to work on various CPU architectures and operating systems. This reflects in the code.

mikemike · on Sept 26, 2016

This is a wrong perception. There is/was no shortage of sponsorships. I had to turn down most of these offers, due to time constraints.