The shortest double-to-string algorithm is basically Schubfach or, rather, it's variation Tejú Jaguá with digit output from Dragonbox. Schubfach is a beautiful algorithm: I implemented and wrote about it in https://vitaut.net/posts/2025/smallest-dtoa/. However, in terms of performance you can do much better nowadays. For example, https://github.com/vitaut/zmij does 1 instead of 2-3 costly 128x64-bit multiplications in the common case and has much more efficient digit output.
Schubfach, Ryū, Dragonbox etc support round-tripping and shortest-width, which (it sounds like) is not important for you. The idea of round-tripping is that if you convert a double to a string and then parse that, you get the exact same value. Shortest-width is to correctly round and generate the shortest possible text. I tried to implement a version that does _just_ round-tripping but is not shortest-width, it is around 290 lines for both parsing and toString [1]
I implemented Teju Jaguá in Rust, based of the original C impl https://github.com/andrepd/teju-jagua-rs. Comparing to Zmij, I do wonder how much speedup is there on the core part of the algorithm (f2^e -> f10^e) vs on the printing part of the problem (f*10^e -> decimal string)! Benchmarks on my crate show a comparable amount of time spent on each of those parts.
I don't have exact numbers but from measuring perf changes per commit it seemed that most improvements came from "printing" (e.g. switching to BCD and SIMD, branchless exponent output) and microoptimizations rather than algorithmic improvements.
What about reasonably fast but smallest code, for running on a microcontroller? Anything signifactly better in terms of compiled size (including lookups)?
If you compress the table (see my earlier comment) and use plain Schubfach then you can get really small binary size and decent perf. IIRC Dragonbox with the compressed table was ~30% slower which is a reasonable price to pay and still faster than most algorithms including Ryu.
Note that ~3-6ns is on modern desktop CPUs where extra few kB matter less. On microcontrollers it will be larger in absolute terms but I would expect the relative difference to also be moderate.