Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Rust is not about memory safety (o-santi.github.io)
57 points by todsacerdoti on June 2, 2024 | hide | past | favorite | 64 comments


What Rust really shines in is in making you handle things that need to be handled. For example: your user might input valid UTF-8 text as a filename for saving. But not all filesystems allow all UTF-8 characters in filenames. Rust makes that explicit because it makes you convert the str/String to another special string type and forces you to handle errors in that conversion explicitly.

Before I encountered that in Rust I have never even given it a thought, and I have been programming for years. And this is just one example, and also why I still recommend to anybody to learn Rust even if you will never use it. There are many valuable lessons in that language that will make you a better programmer in all other languages as well.


I don't want to be writing conversion code between special string types with error handling in 2024. It's too braindamaged for words.

And, first of all, burn that god forsaken filesystem.

If the filesystem doesn't like a certain name, the OS should let you know. Handle that error and send it back to the user to choose a different name.

You're not going to string-convert-and-error-handle your way out of the problem that the user cannot have the name that they specified.


If you don't want to do that, you don't have to. You could just not interact with with filesystems you dislike. Or you could interact with it but purposefully ignore any error handling.

The point I made is: this is an existing problem that could fail your mission critical code. Rust just tells you this could be a problem before it becomes one, it does not force you to care or pick a specific solution to solve the problem. With most other languages you would only find out about this error once it happens in production, if you ever find out about it.

Going back to the user and asking them for a valid filename is also a form of error handling in my book.

Of course ideally file systems would support all filenames and ideally they wouldn't suck, but we write software within a world that is not ideal.


> Going back to the user and asking them for a valid filename is also a form of error handling in my book.

You cannot get away from error handling because there are multiple reasons why a name cannot be used, not all of them having to do with its form.

So, on Unix-like systems, do you use a special Rust string type which doesn't allow slashes or nulls? So that if the regular string fails to convert to that string, you know that it's not a valid name, and you get brownie points for having found that without passing it to the OS?

Also, whenever you divide numbers a / b, always be sure to use a special numeric type for b which doesn't have zero in its domain. Then if the input type doesn't convert to that type, you know at compile time you could have a division by zero.


> and you get brownie points for having found that without passing it to the OS?

You should probably reconsider your security model. Did you really just suggest passing arbitrary input to your OS filesystem to see if it fails? You can probably (hopefully?) figure out yourself why this is a bad idea.

Ever heard of defense in depth? Sanitizing your inputs early and knowing why things failed is a good things. Having an additional redundant layer is not a bug, but a feature, especially if the underlaying layer has shown to be exploitable.

Also consider that if you process user provided filenames you have to sanitize them anyways, unless you want them to give you filenames like /root/.ssh/authorized_keys


> Did you really just suggest passing arbitrary input to your OS filesystem to see if it fails?

Yes? This is what I expect from programs as a user. Whatever input I give them that is intended for the OS, should be passed through to the OS verbatim. There may be some very specific exceptions to that, but that's the general principle.

> You can probably (hopefully?) figure out yourself why this is a bad idea.

I've not seen this after 42 years of coding, sorry.

> Having an additional redundant layer is not a bug,

It can easily turn into a bug where second-guessing the user's OS turns into some kind of hindrance where they are not able to do something they should be able to do.

It's a waste of time coding it.

It erodes consistency. One program in one language/runtime restricts things one way; another one has a slightly different idea.

> Sanitizing your inputs early and knowing why things failed is a good things.

You don't seem to have a fully developed intuition for when this concept applies.

> unless you want them to give you filenames like /root/.ssh/authorized_keys

I've never seen an application that actually has a list of banned filenames like /root/.ssh/authorized_keys and checks every user input against it. It's a complete nonstarter.

If the user wants to tell a program to overwrite /root/.ssh/authorized_keys, the program should just do it.

If the user is not root, that path should not be accessible. If the user is root, it may be allowed to overwrite it even if it has no write permissions. The system security model and configuration has decide that; it's not the application's business to second-guess that policy.


> you handle things that need to be handled. For example: your user might input valid UTF-8 text as a filename for saving. But not all filesystems allow all UTF-8 characters in filenames.

Wait... isn't the usual problem that some filesystems allow invalid UTF sequences, rather than disallowing some valid ones? Or are you talking about ASCII characters like asterisks and colons? Also, don't all the idiosyncrasies differ per-filesystem, which... you won't even know until run time? How is converting to a different data type handling the things that need to be handled, when you have no idea what filesystem that data will be used for?


> isn't the usual problem that some filesystems allow invalid UTF sequences, rather than disallowing some valid ones?

It's both.

> How is converting to a different data type handling the things that need to be handled, when you have no idea what filesystem that data will be used for?

Because you make the translation once, early on, and handle failure then. For the rest of the program, you no longer have to perform those checks because the resulting value can only exist if the check was performed.

> Also, don't all the idiosyncrasies differ per-filesystem, which... you won't even know until run time?

Yes, the handling is at runtime, when making the transform from a string to a path, the result is a Result that needs to be handled, in some way.


Are you saying you believe this conversion from a string to a path should involve I/O? Because it's literally impossible to validate a path's syntax without FS calls.


Rust's PathBuf uses an OsString as backing, which is merely a sparkling Vec<u8>. Displaying a PathBuf or OsString requires an explicit request to either validate that it is valid Unicode which can be handled by the caller, or for a "lossy string" where invalid Unicode codepoints get replaced with �. That's on the presentation side, going path -> user visible string.

On the user provided string -> path case, if you start with accepting String then the conversion is merely a transmute, but some paths can't be represented by your users. If you allow any bytes to be passed in, taking from the user a Vec<u8>, then you can create an OsString and PathBuf without allocation while also allowing to represent any legal (and illegal) path.

The verification of the legality of a path occurs when accessing a file, which is already a fallible operation that you must handle. The path might contain invalid characters for the FS, but the file might also just be missing, or inaccessible, so error handling at that point is not really optional.

Note that Rust's PathBuf/Path don't do canonicalization by default, to account for symlinks and the like. If you want to canonicalize, you have fs::canonicalize which does FS calls to get the right path, and path::canonicalize which doesn't access the FS, assuming that foo/.. will be equivalent to .


I feel you missed the point I was getting at.

> The verification of the legality of a path occurs when accessing a file, which is already a fallible operation that you must handle. The path might contain invalid characters for the FS, but the file might also just be missing, or inaccessible, so error handling at that point is not really optional.

That... completely destroys your earlier point, no?

Let me go back to your original comment, where you wrote:

> your user might input valid UTF-8 text as a filename for saving. But not all filesystems allow all UTF-8 characters in filenames. Rust makes that explicit because it makes you convert the str/String to another special string type and forces you to handle errors in that conversion explicitly.

What I'm saying is, this conversion is not saving you from invalid path characters. You're still going to have to deal with invalid file names (like colons, which are FS-dependent) anyway. Or, to put it another way, I find it quite baffling that you think catching a stray U+2022 in the string earlier than a stray U+003A is somehow a win; if anything, it's just as likely to be misleading. The fact that UTF-8 encoding errors are caught during the conversion really, really doesn't prevent invalid characters from creeping in.

Stepping back: the problem you're trying to solve is literally impossible to solve at compile time. If you feel the early UTF-8 conversion is buying you something meaningful for filesystems, the only thing that proves is that you've been given a false sense of security... by Rust, ironically enough. There are lots of things Rust shines at, but this is not only not one of them, but it may in fact be the least impressive thing about it!


I feel you two are talking past each other a bit.

> Stepping back: the problem you're trying to solve is literally impossible to solve at compile time.

This is not being solved at compile time. OsString/OsStr (and PathBuf/Path) are basically there for ergonomics _and_ safety. They're not just used for filesystem paths either, but for other things too.

All the checks happen at runtime time (but only the checks relevant for the target platform, since it's different implementations per platform) when converting between String and OsString.

For path manipulation and filename encoding quirks (I don't know much about those) I think some of it is done in-memory, but a lot of it happens when you access the disk, that's true.

However you can still isolate those checks early and pass file references, or do a preflight check.

You can do this in others languages too, but Rust surfaces the potential issues nicely with Result in combination with other type system features, and makes the handling of those cases more ergonomic than most things I've worked with


Give programmers enough rope and they'll always hang themselves.


My point was: It really depends on how you give them the rope. Is it a selection of ropes next to a cliff and 95% of the ropes will kill you without warning if you use it to go down it?

Or it could be rope that prevents you from accidentally walking down the cliff, if you don't know it was there.

Having a table saw that doesn't stop when it cuts wet material can be beneficial, but it is better to have one that stops when it hits your finger and where you can specifically deactivate the stopping-mechanism for the usecase of cutting slightly wet materials.


A bit morbid, but very true. You have to be exceptionally disciplined to keep a codebase tidy.


Agree with this take. I think about Rust's memory safety about as much as I think about other language's garbage collectors. Which is to say occasionally, but certainly not every day. It's the underlying design that makes the language ergonomic, but not a daily consideration for average users.

What I appreciate day-to-day about Rust is that it encourages (coerces you at compile time) to get things right. If you did not consider all possible code paths, you will hear about it immediately - not in a bug report 6 months from now. Where things are not right, they're obviously not so (ie unwrap). This gives me confidence that the rust code I ship will run in prod without failing due to some unforeseen error. Every error condition, aside from catastrophic hardware failure, has been accounted for in the AST of the program. That's not "proven correct" but it's close enough for production work.


> Every error condition, aside from catastrophic hardware failure, has been accounted for in the AST of the program. That's not "proven correct" but it's close enough for production work.

But not all possible program inputs...Different space size, but still relevant for some production scenarios.


Yes, but I can be assured that any input will follow one of the code paths that I explicitly wrote. There's no hidden runtime path that will bite me later, meaning that I had to at least reason about every non-happy path at some point in the dev process. It doesn't make the output "correct" but it's more likely to be correct than programming with happy-path-only optimism.


i don't disagree, but will register a bit of counter argument. i think it's great that the software engineering discipline is maturing. i think it's great that we study formal semantics and work towards making it easier to create correct programs and harder to create incorrect ones. new-thing-better, gotcha.

but old-thing-bad-shouldn't-they-have-ism i do not like at all.

i think it is a misrepresentation of what "C" was, and is, to falsely compare it with a modern "safe" language (we saw this before with java. oops i mean pascal. oops i mean ada, oops lisp, etc.). it was never meant to be a "safe" language. it was, and is, basically a portable high-level assembler. it allows, but does not enforce, "structured programming" and user defined types, and was portable. that was at the absolute forefront in the 1970s and 1980s. and the productivity was "off-the-chart". sure, your UNIX might crash now and then, but you "just yell down the hall and reboot it" instead of waiting on a multi-decade project to "do it right" to materialize.

so what now? 50 years hence? is it an appropriate language for string parsing in a security sensitive application? probably not. is it a "bad" language, or inappropriate for some other task? and why aren't we wailing and screeching about the lack of memory safety in assembler, for example?

the problem to me is not the language, but the application of it. say you have an application A where language X is a better fit than language Y, but they used X. you don't call out the designers of language X for not anticipating A, you should instead build A' using language Y and let it prove itself.


This is a fair defense of C, but C++ is what deserves the heat. C is portable assembly, C++ isn’t.


> runtime exceptions are not the solution

They are part of the solution.

Good programming needs to handle errors. Monadic error-handling gets you most of the way, combining the local reasoning of return values with the terseness of unchecked exceptions.

But there are simply too many places where something can go wrong to try to enforce deliberate error-handling everywhere. TFA mentioned indexing into arrays. Integer operations resulting in overflow is another example. To stay sane these fails just have to come out exceptionally.

People writing about Java got it right for a long time: "Exceptions should be used exceptionally". But just like "null is the billion dollar mistake" it hasn't made it into the way people program yet.


> But there are too many places where something can go wrong to try and enforce deliberate error-handling everywhere

Rust actually defines APIs to do just that. Integer overflows can be handled with the `checked_add` family of functions. Indexing can be handled with `get` and `get_mut` on slices. Some choose to develop Rust libraries/projects using only APIs that explicitly cannot panic.

It's certainly not for everyone, but I find it incredible that Rust allows that option to develop code that is not only safe, but also robust against crashing the process (1).

1 - barring compiler/standard library bugs and OOM, which more so depends on the runtime


"Exceptions should be used exceptionally" - I think golang `panic-recover` does exactly this.

They made using `panic` intentionally hard.

Motto: "Don't panic in your code"


Please reconsider your "a lower case only blog, purely for aesthetics." decision. It's a gimmick and makes it unnecessarily harder for readers.


It definitely screamed "I'm low-value surface-level content" to me even though the article was genuinely insightful.


ngl, I didn't even notice until you pointed it out.


I find it quite nice. Everyone has their own tastes I suppose.


For me the most distinguishing trait of Rust is that it's value oriented (not handle oriented like pretty much all other popular languages).

Once you start thinking about values (even for simple variables), the space they occupy (how much, Sized vs Sized?, stack, heap), moving them (which would invalidate borrow) it all starts to make sense. Assignment means "move" by default, not "copy" or "copy handle".

Borrow checker is just a sidenote to that, created just to not have to move and clone stuff too much and expressive types are just nice but not unique.

If you approach it as any other language and think in terms of handles you just get annoyed why your handles (pointers) are so clunky and restricted.

Maybe it's just a bit like Git that its interface is easier to understand if you go from inside out. Interface follows what it's doing. It might be frustrating if you come from the opposite direction. From what you want it to do. But all other popular languages do that instead.


I mean, this is true of any low-level language like C++ or C. Manual memory management. I think it does drive a lot of people nuts who have only ever worked with GC languages. I think your reaction to Rust depends on where you approach it from:

Abstract languages: "Ahhh! String, &str, Cow<str>!?"

Low-level languages: "How come this isn't compiling? Oh, wow, I wouldn't have caught that. Cool!"

I think at this time, there are a lot of people who are getting into Rust from the "top of the stack", so it is their first foray into manual memory management.

Many critiques I have seen of Rust is that it is "too complex" for "no good reason." I think those seem superficially valid, especially if you compare Rust with C. "Just don't use after free bro, skill issue" is easy to say if you've only ever worked on a small, short-lived C codebase with one developer on it. Once you start working on a very large codebase with many contributors, people coming and going, "skill issue" is absolutely nonsensical. The borrow checker is about memory safety; but looking at it more broadly, at the sort of ecosystem-level view, it is about constraining individual programmers. Limiting what an individual programmer can do with the memory, and guiding them towards using memory safely/well, with good compiler warnings.


Disclaimer: my shop develops non-trivial low-level software with machine-checked correctness proofs.

Rice's theorem implies that automatic static analysis (like Rust's borrow checker) will only get you that far. To go further, you must equip the program with a proof, which can be an order of magnitude more complex than the program.

Don't get me wrong, Rust is excellent for developing security-critical applications. Its borrow checker solves many memory safety and concurrency issues, and its type system more generally forces error paths to be explicit. The next step would be to use deductive methods to check that the error paths cannot be taken.

> make invalid states unrepresentable

That is a nice ideal, but not very practical. You need dependent types for that (for example, a sigma-type that bundles a value in a carrier type with a proof that it satisfies an invariant). They have their uses (e.g. at the interface of a library), but they are a pain to work with. You must build the proof terms together with the runtime values, thus entangling program and proofs into an unmaintainable mess. Oh, and forget about modular proof development: the dependent type encodes every constraint on the value, therefore most types must change across the program when you decide to prove an additional property of the program. You can't just keep the existing proof around and prove something else on top of it.

By all means, please keep some invalid states representable, and cleanly separate the program from its various proofs.


Proof terms disappear when you compile the program, they have no representation at runtime. They amount to compile-time-checked capabilities at most. So there's no real "maintainance" issue with entangling programs and proofs.


What does source code/proof maintenance have to do with what remains at runtime? Sure, you can always extract the computational part of a program by erasing the logical parts, but that does not imply that code intermixed with logical parts does not create maintenance issues.

Dependency management is one of those issues. Proofs require an acyclic dependency graph, in order to prevent circular arguments. By intermixing code and proofs, you force everything to fit in a single global dependency order. The point is not that it is impossible (it is possible), but that it is a maintenance nightmare. Sometimes you have to break an otherwise perfectly coherent module into smaller modules, just because you need to insert some other modules between them in the dependency order, that are only relevant to some minor proof. You have to constantly refactor your code as you reason on more aspects of your program. Have you already tested and certified your code and existing proofs? Too bad, your latest proof will require lots of handwaving and recertification to convince everyone that the big refactoring does not threaten the existing properties.

Runnable code stands on its own, and should not be polluted with logical parts for every aspect of the program one may want to reason about.

You can establish any provable property of a program using only basic logical tools: a single precondition and a single postcondition per function, a single invariant per loop, maybe dependent types, a few ghost values here and there, etc, and stuff all your reasoning in them in huge messy conjunctions mixing all the concerns together. That would be the equivalent of using nothing but a Turing machine to write a program. It is possible, it is theoretically appealing, but it does not scale.

Software engineering is a thing for a reason. So is proof engineering.


Maintenance as in, when you change the program you have to update the proof. This is easier if you can keep them modular.


If you don't mind, can I ask you a few questions about your tooling? I work in rust verification and would be interested in hearing about how industry uses verification techniques.


rust is just too ugly for me - you can't even properly use its functions to build capital-A abstractions because it's compiler is too dumb to optimize them properly.


This doesn't match my experience all. Please elaborate.


Could you expand upon this? sounds like an interesting opinion/experience that is worth sharing in more detail.


It doesn't. Most people complaining about "ugliness" of Rust are either too lazy to read documentation (which is great btw), or just don't like the syntax, because it doesn't resemble their favorite language[s]. Some have valid criticism, but they usually lay it out straight, sometimes in the form of a blog post, instead of vague "Rust is bad" statements.


I like Rust a great deal and really enjoy using it, but I also think it's kind of ugly. Like I'm talking surface level aesthetics, purely subjective. I can't quite put my finger on why. I think it's to do with the heavy use of special characters?


i mean..don't act like it's impossible for someone to have a well-informed opinion that Rust is bad. It is an opinion and for some people, Rust is bad! I've given it multiple shakes and can't see myself using it. Don't like the values, the syntax, or the design.

it's just a fact that i can't imagine a situation where i'd use rust over haskell. I'd sooner generate a C program via Haskell eDSL than use Rust if I were to do embedded work, for instance.

And aside from embedded, Haskell clears Rust kind of comically if I lay out a matrix of what I care about in a PL.


It's not impossible. But if one says "Rust is bad", they better elaborate, unless their goal was to post a meaningless comment.


well yeah I wouldn't make that tier of low effort comment I'd at least have a single sentence of color


Nobody is saying you can’t have an informed opinion, you just have yet to explain what your opinion actually means.


You don't have to exhaustively defend your opinion though. "I think X is bad/ugly/etc. $HIGH_LEVEL_COLOR" is plenty imo. You don't have to prove your opinions.


> You don't have to prove your opinions.

You're jumping a few steps ahead, and that's not what we're asking for.

> you can't even properly use its functions to build capital-A abstractions because it's compiler is too dumb to optimize them properly

This line in particular is what (multiple) people are indicating makes no sense. You don't have to exhaustively defend your opinion, but you could write a more insightful opinion. There's a difference at play here.


Capital-A abstraction means lambda calculus. Compiler too dumb means you can't just program with functions & write said Abstractions (which map cleanly to proofs via HC) in Rust because it does not handle them well.

There we go :)



One thing I don’t understand there is why you say it’s ok for a function to return a value, but bad for it to return code (async or thread). What makes it bad? How does it make harder to prove properties of the program?


Because it can make an assumption that the data will do nothing once it is returned.

Sure, code is data (I've written Lisp before), but code isn't just Plain Old Data because it can do stuff.


Would a function pointer be equally bad in your book?


Yes.


What property can’t we prove if we have function pointers?

The article mentions trying to prove local properties of functions, and that’s all well and good, and similarly we could prove local properties of the function being pointed to.

I guess what I’m saying is I think I’m missing something about why you are advocating removing those. It’s not just so we can statically prove that all destructors are called is it?


Function pointers are not the problem. Returning function pointers is the problem.

And the problem is that we don't know if the caller will run that code or not, but if they do, that code is not in the subprogram anymore.

You can pass function pointers to callees, no problem.


Ok, the problem is we don’t know whether that code will be called.

So if we added some way to require that value to be used, something similar to `std::hint::must_use`, then we may be able to return function pointers?


No, because we also don't know what the program does if it is called because it is Turing-complete.


I’m confused. That also holds for functions that don’t return a function pointer, so this cannot be a good argument for banning returning function pointers.


By program, I mean the function pointer.

The caller is not part of the subprogram, but the returned function pointer is, and if the caller runs it, it leaks out, in the same way async and other things do.


https://en.wikipedia.org/wiki/Ada_(programming_language) has been focusing on correctness since 40 years.


Rust is just a popular ML like OCaml only it has a more sane package manager and a more imperative style than OCaml's recursive style.


It has a much more C-like syntax than MLs


Rust is not an ML. All languages in the ML family are garbage-collected. It also doesn't have ML syntax.

Whereas I can take an SML program from my ML textbook and run it almost completely unaltered in ocaml or f# that is absolutely not the case for rust - it literally won't work at all.


Another way to look at Rust is that it’s a programming language in which the goal is to garbage collect at compile time.

I don’t know if it’s the first one to do that. Maybe ATS fits this description too with its linear types, and ATS itself is considered an ML.

With this view, Rust does not seem that far off from the various ML dialects.


I've been saying this for awhile, glad I'm not alone


Completely based take. Thank you for sharing!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: