For promising modern and parallel GC techniques please check MPL or MaPLe with its novel Automatic Management of Parallelism. It won distinguished paper award in POPL 2024 and ACM SIGPLAN dissertation award 2023 by proposing these two main things [1],[2]:
a) Provably efficient parallel garbage collection based on disentanglement
b) Provably efficient automatic granularity control
Standard ML and the community around it has been pretty impressive as far as contributions to memory management literature goes.
There is of course the paper you linked, and there's also the MLKit which was among the first users, and one of the pioneers, of region-based memory management.
I'm one of the authors of this work -- I can explain a little.
"Provably efficient" means that the language provides worst-case performance guarantees.
For example in the "Automatic Parallelism Management" paper (https://dl.acm.org/doi/10.1145/3632880), we develop a compiler and run-time system that can execute extremely fine-grained parallel code without losing performance. (Concretely, imagine tiny tasks of around only 10-100 instructions each.)
The key idea is to make sure that any task which is *too tiny* is executed sequentially instead of in parallel. To make this happen, we use a scheduler that runs in the background during execution. It is the scheduler's job to decide on-the-fly which tasks should be sequentialized and which tasks should be "promoted" into actual threads that can run in parallel. Intuitively, each promotion incurs a cost, but also exposes parallelism.
In the paper, we present our scheduler and prove a worst-case performance bound. We specifically show that the total overhead of promotion will be at most a small constant factor (e.g., 1% overhead), and also that the theoretical amount of parallelism is unaffected, asymptotically.
* Separate compilation vs whole-program compilation. OCaml uses separate compilation and therefore has a very constrained heap object model which makes it possible to get polymorphism across separately compiled modules. In contrast, MaPLe uses whole-program compilation and therefore is able to monomorphize and optimize much more aggressively. Whole-program compilation can be slow for large projects.
* The multicore OCaml effort was driven by backwards compatibility, especially in terms of performance -- they wanted to ensure that the performance of existing sequential OCaml code would be completely unaffected by the new run-time system. In contrast, MaPLe focuses on efficiency and scalability for parallel code.
* Multicore OCaml will let you implement your own scheduler, as a library, on top of coarse-grained threads. In contrast, MaPLe comes with a built-in scheduler, and it's not easy to change it.
We did a comparison with multicore OCaml in https://dl.acm.org/doi/10.1145/3591284, and found that MaPLe can be significantly faster, but that comes with all of the tradeoffs above. And, it's a cross-language comparison, so take it with a grain of salt. Our comparison in particular emphasized similar source code, but typically, fast code in OCaml just looks different from fast code in MaPLe. For example, in OCaml, you often need to manually unbox certain data structures to get better memory efficiency (but MaPLe will often do this for you, automatically).
One of the people who helped optimise the multi core implementation for OCaml said it was the way to go, but that was in 2020. Don’t know where things are now. https://news.ycombinator.com/item?id=23776609
Question for people who are more qualified: How applicable is this to other languages? Could this approach significantly speed up garbage collection in Go for example?
Or do we run into design issues with existing languages?
a) Provably efficient parallel garbage collection based on disentanglement
b) Provably efficient automatic granularity control
[1] MaPLe (MPL):
https://github.com/MPLLang/mpl
[2] Automatic Parallelism Management:
https://dl.acm.org/doi/10.1145/3632880