Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Rust's PathBuf uses an OsString as backing, which is merely a sparkling Vec<u8>. Displaying a PathBuf or OsString requires an explicit request to either validate that it is valid Unicode which can be handled by the caller, or for a "lossy string" where invalid Unicode codepoints get replaced with �. That's on the presentation side, going path -> user visible string.

On the user provided string -> path case, if you start with accepting String then the conversion is merely a transmute, but some paths can't be represented by your users. If you allow any bytes to be passed in, taking from the user a Vec<u8>, then you can create an OsString and PathBuf without allocation while also allowing to represent any legal (and illegal) path.

The verification of the legality of a path occurs when accessing a file, which is already a fallible operation that you must handle. The path might contain invalid characters for the FS, but the file might also just be missing, or inaccessible, so error handling at that point is not really optional.

Note that Rust's PathBuf/Path don't do canonicalization by default, to account for symlinks and the like. If you want to canonicalize, you have fs::canonicalize which does FS calls to get the right path, and path::canonicalize which doesn't access the FS, assuming that foo/.. will be equivalent to .



I feel you missed the point I was getting at.

> The verification of the legality of a path occurs when accessing a file, which is already a fallible operation that you must handle. The path might contain invalid characters for the FS, but the file might also just be missing, or inaccessible, so error handling at that point is not really optional.

That... completely destroys your earlier point, no?

Let me go back to your original comment, where you wrote:

> your user might input valid UTF-8 text as a filename for saving. But not all filesystems allow all UTF-8 characters in filenames. Rust makes that explicit because it makes you convert the str/String to another special string type and forces you to handle errors in that conversion explicitly.

What I'm saying is, this conversion is not saving you from invalid path characters. You're still going to have to deal with invalid file names (like colons, which are FS-dependent) anyway. Or, to put it another way, I find it quite baffling that you think catching a stray U+2022 in the string earlier than a stray U+003A is somehow a win; if anything, it's just as likely to be misleading. The fact that UTF-8 encoding errors are caught during the conversion really, really doesn't prevent invalid characters from creeping in.

Stepping back: the problem you're trying to solve is literally impossible to solve at compile time. If you feel the early UTF-8 conversion is buying you something meaningful for filesystems, the only thing that proves is that you've been given a false sense of security... by Rust, ironically enough. There are lots of things Rust shines at, but this is not only not one of them, but it may in fact be the least impressive thing about it!


I feel you two are talking past each other a bit.

> Stepping back: the problem you're trying to solve is literally impossible to solve at compile time.

This is not being solved at compile time. OsString/OsStr (and PathBuf/Path) are basically there for ergonomics _and_ safety. They're not just used for filesystem paths either, but for other things too.

All the checks happen at runtime time (but only the checks relevant for the target platform, since it's different implementations per platform) when converting between String and OsString.

For path manipulation and filename encoding quirks (I don't know much about those) I think some of it is done in-memory, but a lot of it happens when you access the disk, that's true.

However you can still isolate those checks early and pass file references, or do a preflight check.

You can do this in others languages too, but Rust surfaces the potential issues nicely with Result in combination with other type system features, and makes the handling of those cases more ergonomic than most things I've worked with




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: