Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

FWIW you cannot have Unicode-correct rendering by caching at the codepoint (what many people would call “character”) level. You can cache bitmaps for the individual “glyphs”—that is, items in the font’s `glyf` table. But your shaping engine still needs to choose the correct “glyphs” to assemble into the extended grapheme clusters dictated by your Unicode-aware layout engine.


Exactly why I referred to drawing glyphs instead of characters :)

There's even more depth one can go into here: subpixel positioning. To correctly draw glyphs that may be on subpixel positions, you need to rasterize and cache glyphs separately for each subpixel position (with some limited amount of precision, to balance cache usefulness and accuracy).

However I have a feeling that describing an entire Unicode-aware text stack here may not be useful, especially if TFA seems to only care about simple-script monospace LTR.


Nowadays people expect their terminals to handle UTF-8, or at least the Latin-like subset of Unicode, without dealing with arcana such as codepages. For even the simplest fonts, rendering something like í likely requires drawing multiple glyphs: one for the dotless lowercase I stem, and one for the acute accent. It so happens that dotless lowercase I maps to a codepoint, but it is not generally true that a single extended grapheme cluster can be broken down into constituent codepoints. So even “simple” console output is nowadays complected by the details of Unicode-aware text rendering.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: