If you were ever curious how modern JavaScript VMs (or VMs for other dynamic languages) achieve high performance, this is an awesome resource. It explains tiers, the goals and design of different tiers, on stack replacement, profiling, speculation and more!
JavaScript engines are the most advanced at this (only LuaJIT is even comparable), it would be awesome if Python, Perl, Ruby, PHP or the like aimed for the same level of performance tech.
I'd say that the JVMs (particularly Azul's) are more advanced at this. Even with types, they sill speculate in order to inline across virtual method calls.
But agreed that the amount of perf JS engines can achieve is truly impressive.
JSC speculates in order to inline across virtual method calls while also inferring types.
Also inlining across virtual method calls is just old hat. See the post’s related work section to learn some of the history. Most of the post is about techniques that are more involved that inlining and devirtualization.
HotSpot is a fork of strongtalk, which did the same thing and in fact invented the techniques you're talking about (ie. creating fake backing types for untyped code and optimistically inlining those with bounces out to the interpreter on failures, perhaps keeeping several copies around and swapping out the entry in the inline cache).
Additionally that functionality has been added to over time with the invokedynamic byte code and it's optimizations.
That’s just one snarky example. There are lots of others. I’m sure you could identify them easily, if you are familiar with HotSpot and you read the post.
JSC has four in the sense that hot enough functions go through LLInt, then Baseline, then DFG, and then FTL. They go through three lower optimization levels before the max one.
HotSpot has three in the sense that you get interpreter, client, then server.
Both VMs have additional execution engines available behind flags. JSC has a different interpreter (CLoop, based on LLInt but different) for some systems, so it’s like a “fifth tier” if you think of “tier” as just an available execution engine. I think of tier as a stage of optimization that you get adaptively, without having to specifically configure for it. I get that HotSpot can alternatively AOT or Graal, but those aren’t tiered with the rest to my knowledge (like there is no interpreter->client->server->graal config but if there was then that would be four tiers).
I'll give you that levels 1-3 use the same IR, but that says more about the generality of the C1 IR than JSC being more advanced for using different IRs IMO.
According to that, it skips level 1 and goes all the way to 3 in some cases.
Again, not the same as what JSC does, and not nearly as aggressive. Most notably, there is no baseline jit. Also, C1 in any config compiles slower than DFG.
You started by claiming that Azul is more advanced. You got snark.
Maybe it’s more advanced at collecting garbage or supporting threads, but it is not comparable in the field of type inference, because Azul’s VM and other JVMs do not do any of the kind of type inference described in this post. And no, invokedynamic is nothing like inline caching for JS - not even close. Most of this post is about what type inference for dynamic languages really looks like when you invest HotSpot-like efforts to that specific problem. Saying that Azul is more advanced at type inference is far off from the truth at a very fundamental level.
So I gave you snark. Being snarky is fun sometimes!
Anyway, this is silly. The post cites HotSpot and it’s predecessors. It’s just not the case that we repeat their technique, and number of tiers is a snarky difference and not the real one. I’m not going to give you a TL;DR that enumerates the meaningful differences just because you claimed that there aren’t any.
HotSpot does not need to infer object shapes, so it doesn't need inline caches for field accesses. HotSpot doesn't need to infer whether (and where) methods lie in the prototype chain of an object. HotSpot does not need to speculate that arithmetic fits in integer range. HotSpot does not need to speculate that fields are (not) deleted from objects. HotSpot does not need to speculate that a function has a .toString() called on it. HotSpot does not need to speculate that a local variable can be modified by a sloppy direct eval. HotSpot does not need to do scope analysis and closure conversion. HotSpot does not need to speculate that the arguments object does not escape.
All of these things are extremely typical for a JS VM to profile and speculate on.
I do not work on JSC, but I did work on V8 and it does all of these things.
>> HotSpot doesn't need to infer whether (and where) methods lie in the prototype chain of an object.
> How is that different than vtable lookups?
sigh Maybe to you this seems fun, but from where I am sitting, you are starting to get annoying. It is really not necessary for you to try to dominate this whole post with some completely unnecessary over-the-top HotSpot-is-the-best-ever chest thumping. This wasn't an isolated comment, but you are all over the thread and in people's faces. It's odious when you want to make confident claims and yet don't know the difference between vtable lookup and JavaScript prototype access. It's exactly this kind of exchange that we need less of.
I wish you would step back and appreciate more of the shared context here to recognize that you don't need to explain (and exaggerate) the inner workings of JVMs to people who have worked on them for quite some years in the past. We'd probably be friends and have fun if you'd drop the HotSpot schtick. I went through that phase too, about 10 years ago.
At this point your comments here and elsewhere show a pattern of aggression that just makes people want to disengage, but I am forced to offer you some advice to try to salvage this. It's no fun for anyone to be part of a community with such an adversarial and hostile conversation ongoing. I hope I can resist the urge to be drawn into correcting the rest of your misunderstandings about JavaScript here in a way that doesn't make you feel slighted, because you're not receptive to it anyway and that's an obviously unproductive discussion. But listen, invokedynamic isn't a panacea, it's a nice addition to the JVM to accomplish some dynamism, but it's not mission accomplished by any means. Invokedynamic is a mechanism that requires the existence of dynamic optimization at a higher level, i.e. a higher-level language runtime. Your claims again overstate what it does. You didn't understand what I meant about scope analysis or the arguments object because those don't exist at the JVM level and you must have patterned matched them to something else completely different.
And don't even get me started on HotSpot's startup time.
Many, if not most, of the people working on advanced JavaScript VMs come from a Java background. I would say JS VMs are orders of magnitude more sophisticated in the type of behavioral profiling they do in order to recover types, understand object shapes, and shortcut prototype lookups. While HotSpot and Azul (a progeny of HotSpot) are highly tuned, a lot more investment has gone into bread-and-butter JIT and GC work, whereas JS runtimes have to have a very advanced object model and type profiling infrastructure.
PHP has a JIT coming in 8.0, using the same underlying tech that LuaJit does. Unfortunately, most of what people do with PHP isn't CPU bound, so it doesn't help much.
This could be a chicken and egg thing. As soon as JavaScript started to get faster, we started using it for more CPU intensive tasks, which in turn led to more investment in optimizations, and so on, in a virtuous circle.
Large entities are competing on browser performance "which in turn led to more investment in optimizations" part does not really ring true. We are just benefiting from browser market dynamics.
Don't believe him. The php jit us entirely different from Lua hit, other than using dynasm. Even raku is using dynasm.
The PHP jit is actually much easier to understand. With SSA for all tiers, CFG, but not that much type speculation at all. The advantage of php, Perl, Ruby over JS is that objects don't vary that much, methods are not overridden that much, and arrays and hashes are much easier.
The php jit is at least 10x easier and smaller than JSC. Also C, not C++.
I feel like I can almost visualize the killshot slides that the HipHop and HHVM folks used to show the extent to which PHP can be CPU bound and the extent to which server idle time can be increased by using speculative compilation tricks on PHP.
So, I think it is cpu bound for at least some people.
HHVM's 2nd generation region based JIT performs enormously better than PHP7.x in places where dynamic languages don't perform so well but PHP7.x has the clear lead when benchmarking large apps like WordPress.
I get where the "rooting for the underdog" feeling comes from, but it still feels good that the relatively small and underfunded PHP team mostly beat Facebook here. I like to imagine there is some internal debate at FB on whether to just go back to mainline PHP and kill HHVM. Especially with a credible JIT coming.
LuaJIT isn't really comparable. Filip touched on why the tracing compilers struggle: tail duplication. Avoiding tail duplication means having heuristics which keep traces short which means limiting your optimization scope.
This makes some variables which would otherwise have to be loaded at the start of the trace into constants in the recorded trace. A more complex compiler can just do the same optimizations without the fuckaround.
Is that really the point of that code? It seems like the point is to generate code to map real numbers from math.random to letters in a probability distribution in the smallest number of branches.
Maybe not in architecture and thus in perf ceiling, but they actually get good perf, which is generally not the case for the main implementations of other popular scripting languages IMO. Thanks for highlighting the tracing JIT issue.
The post gives PyPy a shout out. But it’s subtle. PyPy is similar but not the same. I think that JSC’s exact technique could be tried for Python and I don’t believe it has.
Maybe, I wish I had the occasion to deploy PyPy in production, but as a daily Python user since 2008: I never had to switch to PyPy to fix any performance problem that mattered for my users or me, but I keep an eye on PyPy and admire it as well as the developers behind it.
Python actually uses prototypal inheritance behind the scenes.
Pypy is a tracing JIT though which is quite different. JSC and v8 compile a method at a time. Spidermonkey used to use tracing, but switched to methods too (though I think it still does limited tracing in some situations).
There’s no tracing in either JägerMonkey or IonMonkey. That’s just a confusing statement about the similarity of speculation guards and OSR exits to trace guards and side exits. For a while TraceMonkey was used as a higher tier above JägerMonkey which might be what you’re thinking of?
I don’t work for Mozilla but spent a ton of time studying TraceMonkey for building my own tracing JIT.
I think that Julia uses the LLVM JIT. When I used it last, Julia was somewhat prone to long startup delays since it would fully compile functions on first execution.
That can work great for long running applications such as its main niche of scientific computing but would be terrible for JS since you want the page to be interactive ASAP.
This is why talking about JIT performance is so complicated. Not only do you need to worry about compilation speed and speed of the generated code, you also have to worry about a lot about impact on memory and impact on concurrently running code. Plus most JITs also need to have some sort of profiling system running all the time as part of those constraints to only spend compilation resources on hot paths.
The difference between Julia and speculative compilers is that Julia is much more likely to have to compile things to run them. JSC compiles only a fraction of what it runs, the rest gets interpreted.
Modern JS engines have a multi-tier structure and profiling info that lets them choose what to JIT and which point on compile speed vs runtime speed tradeoff space to take for any given chunk of code. The post covers a lot of this.
I always love reading more about JavaScriptCore internals although I have to confess that much of the time one of the main lessons I get from it is that life would be much easier if we had types and didn't need to speculate so much in the first place.
Not having types is a virtue for the web, where interfaces can change in weird ways. Dynamism leads the engine and the JS code to be less tightly coupled. So, there’s more wiggle room for evolution in the engine and more wiggle room for portability defenses in JS.
So, it just depends on how important the benefits of dynamic types are versus the benefits of static types. I don’t think static types of dynamic types are better; they are just good at different things.
It wouldn't be the JS we know and love if it had been burdened with a type system designed by a committee sometime in the 90s. That said, one thing we can say for sure is that the dynamic typing doesn't make your job any easier :)
You may want to speculate even when you have precise concrete types.
For example your type system may tell you have you have an int32, but you can speculate that only the lowest bit is ever set, with a kind of synthetic type you could call int32&0x1 which isn't expressable in the type system the user uses.
> dynamic typing doesn't make your job any easier
Yeah, it makes millions of application programmers' jobs easier at the expense of a small group of experts - sounds like the right tradeoff?
> Yeah, it makes millions of application programmers' jobs easier
I don't think it's that simple. Large programs get unwieldy, no matter what language you write it in, and a large body of evidence suggests that having static types for both safety and documentation is a big win, because it makes programs more robust and ironically makes programmers more productive in the long run. As you and I both know, this is a long discussion that stretches back decades, so it probably isn't going to be productive to hash it out here.
A more important discussion which is not being had is the question of the size of the trusted computed base. Framed this way, it makes sense to minimize the size of the trusted computed base and not have a complicated dynamic language implementation on the bottom. Instead we should have layers with a very strict statically-typed target that is easy to make go fast at the bottom. This is why I want to put WebAssembly under everything. Yes, even JS. (Fil would probably not agree here :-))
> For example your type system may tell you have you have an int32, but you can speculate that only the lowest bit is ever set,
Range analysis is really important for JavaScript code because everything is a double and it is generally a win to avoid doing double math if possible, but I am duoubtful that it makes much difference for statically-typed integer code outside of array bounds checking optimizations. In my mind, range analysis on integers really only feeds branch optimizations. Maybe it's the case that optimizing integer code that is full of overflow checking benefits from range analysis (the kind of stuff you find inside the implementation of a dynamic language), but I can't really think of much else.
> a large body of evidence suggests that having static types for both safety and documentation is a big win
Citation needed. A review of studies on static vs. dynamic languages concluded "most studies find very small effects, if any". https://danluu.com/empirical-pl/
I read through all those studies and didn't think they shed any light on the subject, as this is something very hard to get numbers on:
"The summary of the summary is that most studies find very small effects, if any. However, the studies probably don't cover contexts you're actually interested in."
”Large programs get unwieldy, no matter what language you write it in, and a large body of evidence suggests that having static types for both safety and documentation is a big win“
I am beginning to think that cutting your large untyped program into pieces and typing it only at the boundaries will get you all the benefits. That probably means most, if not all, types inside those pieces can be inferred.
There are still ways to program to unstable interfaces in static languages though, and they tend to be safer overall because they are isolated from the rest of the language.
JavaScript engines are the most advanced at this (only LuaJIT is even comparable), it would be awesome if Python, Perl, Ruby, PHP or the like aimed for the same level of performance tech.