> The thing is, why 10? Why not 9 or 11? The code says "if you see 'string of ne...

LegionMammal978 · on Oct 6, 2024

> From the codebase, you know that '\n' is a char. A char is a value between 0 and 255, if you explicitly convert '\n' to int then you happen to find the ascii value and you are good to go and there is no need to pretend there is any poetry in this.

But how does the computer know which int to output when you "explicitly convert '\n' to int"? As humans, we can clearly just consult the ASCII table and/or the relevant language standard, but the computer doesn't have a brain and a pair of eyes, instead it must store that association somewhere. The purpose of this article is to locate where the association was originally entered into the source code by some human.

The question is less interesting for ordinary characters like 'a', since the codes for those are presumably baked into the keyboard hardware and the font files, and no further translation is needed.

floren · on Oct 7, 2024

> The question is less interesting for ordinary characters like 'a', since the codes for those are presumably baked into the keyboard hardware and the font files, and no further translation is needed.

It's true that the question is less interesting for regular characters, but your explanation why is way off base.

Consider a computer whose only I/O is a serial console. It is concerned with neither a keyboard nor a font file.

LegionMammal978 · on Oct 8, 2024

For a computer whose only I/O is a serial console, I'd say that it has no character 'a', but only a character 0x61 with such-and-such properties and relationships with other characters. It's when we press the 'A' key on our keyboard and get 0x61 out, or put 0x61 in and see the glyph 'a' on a display or a printout, that the code becomes associated with our human concept of the letter.

That is, suppose I design a font file so that character 0x61 has glyph 'b' and 0x62 has glyph 'a', and I accordingly swap the key caps for 'A' and 'B' on my keyboard. If I write a document with this font file and print it off, then no one looking at it could tell that my characters had the wrong codes. Only the spell-checker on my computer would complain, since it's still following its designers' ideas of the character codes 0x61 and 0x62 are supposed to mean within a word.

wruza · on Oct 7, 2024

I understand and share the excitement on this subtle topic, but this only exists on a source code level. There’s a list of source codes linked in time by compilation processes that eventually lead to a numeric literal entered by a human.

But physical computers knew what to insert in place immediately, because there was 0x0a somewhere in binary every time.

LegionMammal978 · on Oct 7, 2024

Of course our physical computers know what to insert, since it was embedded in the binary. But it hasn't always been embedded "every time": there was a point in the past where someone's physical computer didn't know what to insert, and so they had to teach it by hand. Without the source code (or some human-entered code) at the end of the chain, we'd have to insist that the code was embedded from the very dawn of time, which would be rather absurd. Personally, I like how this article and other such projects push back on some of the mysticism around bootstrapping that sometimes floats around.

immibis · on Oct 7, 2024

You're operating on different levels. Of course we know ASCII 10 is newline. The comment you're replying to is asking: yes but how does the CPU know that, when it runs the compiler? Obviously it sees the number 10 in the machine code. Where did that 10 come from - what is the provenance of that particular byte? It didn't come from the source code of the compiler, which just says \n, and it didn't come from the ASCII table, because that's just a reference document for humans, which the computer doesn't know about.

eru · on Oct 7, 2024

> It simply becomes "if a you see 'the arbitrary symbol for the new line', output 'the corresponding ascii value'".

> I read the quote as "if you see 'a', output 'a' in ascii code." which is not mysterious in any kind of way.

Only, it's not like that.

It's like:

> If you see a backslash followed by n, output a newline.

There's no 'newline character' in the input we are parsing here.

pharrington · on Oct 7, 2024

Here's another way to think of the inspiration for the article. You're creating a file to use as input to another computer program (in this case, rustc). Your text file contains the ascii strings 'a', and '\n'. The rustc computer, when reading the text file, reads the corresponding byte sequences - 39, 97, 39, and 39, 92, 110, 39, respectively. The first byte sequence contains the 97 that you desire, but the second sequence does not contain a 10. Yet, rustc somehow knows to generate a 10 from 39, 92, 110, 39. How?

antonvs · on Oct 7, 2024

> You can look into the ascii table then.

I suggest reading the article, to find out just how badly you’re missing the point.