It's really not enough to just say that a GC gave you more pointers = it has wor...

hedora · on March 30, 2024

I think the fact that it is complicated to reason about is precisely why systems developers don’t trust GC’s.

It’s far easier to write a threadsafe bump (slab) allocator than to locate and diagnose the code that someone wrote two years ago and, as of last week, started causing the GC to blow up the cache, contend on a lock, fragment the heap, add unbounded pauses, burn an extra cpu, etc, etc.

(Though, at this point, most mallocs are so good that the slab allocator loses anyway and there’s no need to bother.)

celrod · on March 30, 2024

FWIW, that synthetic benchmark was reflective of some real world code we were deploying. Using malloc/free for one function led to something like a 2x performance improvement of the whole program.

I think it's important to differentiate between malloc implementations/algorithms, just like it's important to differentiate between GCs. E.g., mimalloc "shards" size classes into pages, with separate free lists per page. This way, subsequent allocations are all from the same page. Freeing does not free eagerly; only if the entire page is freed, or if we hit a new allocation and the page is empty, then it can hit a periodic slow path to do deferred work. https://www.microsoft.com/en-us/research/uploads/prod/2019/0...

Good malloc implementations can also employ techniques to avoid fragmentation. It's unfortunate that the defaults are bad.

But I confess, compacting GCs and profiling the effects of heap fragmentation (especially over time in long running programs) are both things I lack experience in. Microbenchmarks are unlikely to capture that accurately.

yxhuvud · on March 31, 2024

If we are talking long running multithreaded processes, then libc malloc have issues that isn't even fragmentation. There are many workloads when it seems to forget to return whole empty pages and just accumulate allocated areas despite them being totally empty.

convolvatron · on March 30, 2024

compaction really does help runtimes alot. but I'm not sure how much it really has to do with line level locality. in general we don't try to batch related objects together except in a coarse form by generation.

I think the measurable benefit comes from page level savings, both reducing the number of trips to the kernel to get zeroes pages, and from reduced pressure on the tlb.

but I have definitely seen numbers like 20% on some workloads for turning on compaction

tsimionescu · on March 31, 2024

> in general we don't try to batch related objects together except in a coarse form by generation.

It greatly depends on the GC algorithm, right? Copying collectors naturally bunch up related objects (though of course, in which order is less well defined, if one object has multiple pointers to others).

flohofwoe · on March 31, 2024

> Compacting GC almost always has better cache utilization than malloc

...only if you put each object into its own heap allocation, which is a dumb thing to do in manual memory management. Typically you'd group many related objects of the same type tightly packed into the same allocation and loop over them in sequence to make proper use of prefetching.

The programmer will always have a better idea how objects are going to be used compared to an automatic memory management solution, and even with a GC it makes a lot of sense to use arrays of value types for better memory layout control.