the result of this licensing: in the future there's gonna be a shitty free linux variant, and a SaaS premium variant. of course for complete distros this has always been the case, but now we'll get it for the core components.
Indeed. They knew there was risk associated with this, which is why they didn't just plop it into the LTS release. If it isn't working acceptably by the 26.04 release window, it'll just get reverted.
"Interim releases will introduce new capabilities from Canonical and upstream open source projects, they serve as a proving ground for these new capabilities." https://ubuntu.com/about/release-cycle
Seriously, topic like this are either commented as:
1. This is an inevitable problem that is being handled in a sensible manner by competent engineers.
2. X company is stupid and their engineers are stupid only someone as smart as I would be capable of doing it right
It tells a lot about the mental maturity of each participant. Not a single comment is "Maybe I don't know enough about this to voice an informed opinion", although that's probably a good indiucator.
Where it seems like text based forums using upvotes/likes or reactions encourages those who are less inquisitive and/or humble to take up a lot of the atmosphere.
It got me thinking that the internet today has more people on it but fewer forums to engage with technical topics in depth.
No, the same thing can and does happen on any API. As an obvious example, there are an annoying number of programs that depend on GNU's libc in particular, and which therefore break when someone tries to compile against ex. musl.
This isn't true, libc is magnitudes better defined than binary names in shell scripting, but it's still yet another case of approximation-API with multiple competing implementations.
There are plenty of ecosystems where programs declare a specific library implementation they expect to call into (Rust, Python, Npm, Ruby, Perl, ...) often even constrained by versions. But also if you depend on libcurl you are only going to have to deal with multiple verions of the same implementation (that you can still constrain in e.g. pkg-config).
In shell scripting you have to deal with stuff like "in nc, EOF on stdin is undefined behavior and implementation specific".
I don't see the difference? There's a POSIX spec for coreutils (and some other shell-usable executables), there's a POSIX spec for libc (and some other C libs), and both are a somewhat painful subset of what most programs/scripts actually use. And yes, in both cases often the solution is to explicitly depend on a particular implementation or even version thereof; systemd explicitly only supports glibc, shell scripts targeting macOS may require that you install a newer version of bash than the OS ships, and yes, if you need nc to behave a particular way then you need to depend on the right one (I've actually personally hit that; IIRC I installed netcat-openbsd but it didn't include a feature I needed). In all cases, there may be a portable standard, but it doesn't cover everything, so if you're not inside what's covered by the standard then you should specify your actual dependencies. It still doesn't matter whether the API you're touching is mediated by the link phase of a compiler chain or *sh calling particular binaries.
Can someone post details about why md5sum from the Rust Coreutils is producing different results from GNU Coreutils. The post does not claim this is a bug. (Surprisingly.)
> Out of the box, script will pass bs= option to dd for it to be aware of how much to skip from the beginning of input data (and on later while loop iterations). This seem to have handled by dd either improperly or at least in a different way than it was in the past (with GNU core utils). However, once bs= is replaced with ibs=, all seems to go back to normal.
The bs/ibs/obs options don't "skip" anything, but determine how much data to buffer in memory at a time while transferring. Regardless, it's hard to fathom how something this simple got messed up, especially considering that the suite supposedly has good test coverage and has been getting close to a full green bar.
bs=BYTES
read and write up to BYTES bytes at a time (default: 512); over‐
rides ibs and obs
As described, the script should have worked as is, and the problem is in the handling of the dd options. (But I didn't verify the accuracy of the description.)
Kind of off-topic, but those commands also add a newline character to the md5sums, giving unexpected results. I was trying it in a php interpreter and getting different values.
> Wonder if `\00` is handled different between them.
`dd` is for copying all the bytes of a source (unless you explicitly set a limit with the `count` option), regardless of whether they're zero. It's fundamentally not for null-terminated strings but arbitrary binary I/O. In fact, "copying from" /dev/zero is a common use case. It seems frankly implausible that the `dd` implementation is just stopping at a null byte; that would break a lot of tests and demonstrate a complete, fundamental misunderstanding of what's supposed to be implemented.
> Not sure how to run the rust version but my md5sum seems to care how many null bytes there are.
Yes, the md5 algorithm also fundamentally operates on and produces binary data; `md5sum` just converts the bytes to a hex dump at the end. The result you get is expected (edit: modulo hiccuphippo's correct observation), because the correct md5 sum changes with every byte of input, even if that byte has a zero value.
With a sufficient number of users of an API,
it does not matter what you promise in the contract:
all observable behaviors of your system
will be depended on by somebody.
Hmm, this plus the performance regressions makes me wonder if it's too soon to move to the rust version of Coreutils. And makes me wonder if this is gonna cause more pushback regarding the rust in the kernel movement.
> this is gonna cause more pushback regarding the rust in the kernel movement.
Only among those that don't understand that, if this is a problem, then it is Canonical problem, not a Rust problem.
To give another example, Canonical includes ZFS in Ubuntu too. And, for a while, Canonical shipped a broken snapshot mechanism called zsys with Ubuntu too. Canonical ultimately ripped zsys out because it didn't work very well. zsys would choke on more than 4000 snapshots, etc. zsys was developed in Go, while other snapshot systems developed in Perl and Python did a little less and worked better.
Now, is zsys a Go problem? Of course not. It wasn't ready because Canonical sometimes ships broken stuff.
> Only among those that don't understand that, if this is a problem, then it is Canonical problem, not a Rust problem.
(This is hard to express in a careful way where I'm confident of not offending anyone. Please take me at my word that I'm not trying to take sides in this at all.)
The dominant narrative behind this pushback, as far as I can tell, is nothing to do with the Rust language itself (aside perhaps from a few fringe people who see the adoption of Rust as some kind of signal of non-programming-related politics, and who are counter-signaling). Rather, the opposition is to re-implementing "working" software (including in the sense that nobody seems to have noticed any memory-handling faults all this time) for the sake of seemingly nebulous benefits (like compiler-checked memory safety).
The Rust code will probably also be more maintainable by Rust developers than the C code currently is by C developers given the advantages of Rust's language design. (Unless it turns out that the C developers are just intrinsically better at programming and/or software engineering; I'm pretty skeptical of that.) But most long-time C users are probably not going to want to abandon their C expertise and learn Rust. And they seem to outnumber the new Rust developers by quite a bit, at least for now.
> Rather, the opposition is to re-implementing "working" software
I understand the argument, and its sounds good as far as most things go, but it misses an important fact: In OSS, you can and should find your own bliss. If you want to learn Rust, as I did, you can do it by reimplementing uutils' sort and ls, and fixing bugs in cp and mv, as I did. That was my bliss. OSS doesn't need to be useful to anyone. OSS can be a learning exercise or it can be simply for love of the game.
The fact that Canonical wants to ship it, right now, simply makes them a little silly. It doesn't say a thing about me, or Rust, or Rust culture.
> Some would really prefer to at least be able to get some attention (and perhaps a paid job) this way.
Not that I agree, but people seem to be giving uutils lots of attention right now? A. HN front page vs. B. obscure JS framework? I'll take door "A"?
I had someone contact me for a job simply because my Rust personal project had lots of stars on Github. You really don't know what people will find interesting.
> The dominant narrative behind this pushback, as far as I can tell, is nothing to do with the Rust language itself (aside perhaps from a few fringe people who see the adoption of Rust as some kind of signal of non-programming-related politics, and who are counter-signaling).
Difficult to say with certainty, because it's easy to dress "political" resistance in respectable preference for stability. (Scare quotes because it's an amalgam in which politics is just a part.) Besides, TFA is Phoronix, whose commentariat is not known for subtlety on this topic.
Replacing coreutils is risky because of the decades of refinement/stagnation (depending on your viewpoint) which will inevitably produce snags when component utilities interact in ways unforeseen by tests -- as has happened here. But without risk there's no reward. Of course, what's the reward here is subject to debate. IMO the self-evident advantage of a rewrite is that it's prima facie evidence of interest in using the language, which is significant if there's a dearth of maintainers for the originals. (The very vocal traditionalists are usually not in a hurry to contribute.)
Is there really a dearth of maintainers for the originals? They already work fine, no? To me it sounds a bit like: "Addition has become stagnant, so we need to re-implement it in higher category theory. Sure, 99% of even research mathematicians don't benefit from that re-implementation. But no risk no reward! If vocal traditionalists refuse to contribute to reinventing the wheel, maybe they're working on something that hasn't been refined/stagnated decades ago, but I won't take their perspective (that addition already works fine) seriously, unless they start re-implementing addition as well."
so why create Wayland when we had X? why create another linux distro when there are so many already? why create C if we already had assembly? why create new model cars every year? why architect new homes every year? What you are proposing is we stop making changes or progress.
Because X11 had a lot of issues that got papered over with half-baked extensions and weird interfaces to the kernel.
The problem is that Wayland didn't feel like doing the work to make fundamental things like screen sharing, IMEs, copy-paste, and pointer warping actually ... you know ... work.
The problem Wayland now has is that they're finally reaching something usable, but they took so long that the assumptions they made nearly 20 years ago are becoming as big a problem as the issues that were plaguing X11 when Wayland started. However, the sunk cost fallacy means that everybody going to keep pounding on Wayland rather than throwing it out and talking to graphics cards directly.
And client rendered decorations was always just a mind bogglingly stupid decision--but that's a Gnome problem rather than a Wayland issue.
Rust is trying to systemically improve safety and reliability of programs, so the degree to which it succeeds is Rust's problem.
OTOH we also have people interpreting it as if Rust was supposed to miraculously prevent all bugs, and they take any bug in any Rust program as a proof by contradiction that Rust doesn't work.
> Rust is trying to systemically improve safety and reliability of programs, so the degree to which it succeeds is Rust's problem.
GNU coreutils first shipped in what, the 1980s? It's so old that it would be very hard to find the first commit. Whereas uutils is still beta software which didn't ask to be representative of "Rust", at all. Moreover, GNU coreutils are still sometimes not compatible with their UNIX forebears. Even considering this first, more modest standard, it is ridiculous to hold this software to it, in particular.
You would not be able to find the first commit. The repositories for Fileutils, Shellutils, and Texutils do not exist, at least anywhere that I can find. They were merged as Coreutils in 2003 in a CVS repository. A few years later, it was migrated to git.
If anyone has original Fileutils, Shellutils, or Textutils archives (released before the ones currently on GNU's ftp server), I would be interested in looking at them. I looked into this recently for a commit [1].
In this case I agree. Small, short-running programs that don't need to change much are the easy case for C, and they had plenty of time to iron out bugs and handle edge cases. Any difficulties that C may have caused are a sunk cost. Rust's advantages on top of that get reduced to mostly nice-to-haves rather than fixing burning issues.
I don't mean to tell Rust uutils authors not to write a project they wanted, but I don't see why Canonical was so eager to switch, given that there are non-zero switching costs for others.
>OTOH we also have people interpreting it as if Rust was supposed to miraculously prevent all bugs, and they take any bug in any Rust program as a proof by contradiction that Rust doesn't work.
Yeah, that's such a tired take. If anything this shows how good Rust's guarantees are. We had a bunch of non-experts rewrite a sizable number of tools that had 40 years of bugfixes applied. And Canonical just pulled the rewritten versions in all at once and there are mostly a few performance regressions on edge cases.
I find this such a great confirmation of the Rust language design. I've seen a few rewrites in my career, and it rarely goes this smoothly.
It might be a bit of bad publicity for those who want to rewrite as much as possible in Rust. While Rust is not to blame, it shows that just rewriting something in Rust doesn't magically make it better (as some Rust hype might suggest). Maybe Ubuntu was a bit too eager in adopting the Rust Coreutils, caring more about that hype than about stability.
> OTOH we also have people interpreting it as if Rust was supposed to miraculously prevent all bugs
That is the narative that rust fanboys promote. AFAIK rust could be usefull for a particular kind of bugs (memory safety). Rust programs can also have coding errors or other bugs.
It's not about rust specifically, it's about replacing working software with rewrites and going from a code base written in a single language to one written in multiple.
hopefully they got reported as bugs, I spent some time in June making sure a basic Arch Linux system can compile itself using uutils, and things mostly worked, the only build failures could be argued were shell scripts depending on undefined behavior, like "calling install(1) with multiple -m arguments":
As an aside on the GNU *sum tools, I found they're quite slow. A few months ago I wrote a simple replacement in Go for UX reasons and somewhat to my surprise, the Go implementation of most hash algorithms seems about 2 to 4 times as fast when using a simple naïve "obvious" single-threaded implementation. It can be sped up even more by using more than one core. Go has assembly implementations for most hash functions. I didn't really look at the coreutils implementation but I'm guessing it's "just" in C.
At any rate, small teething issues aside, long-term things should be better and faster.
GNU Coreutils uses the OpenSSL implementation of hashes by default, but some distributions have disabled it using './configure --with-openssl=no'. Debian used to do this, but now links to OpenSSL.
This is on Void. It doesn't have --with-openssl configure args in the package, although the binary also doesn't link to lib{ssl,crypto}. It probably gets auto-detected to "no"(?)
For context, I am a committer to GNU Coreutils. We have used OpenSSL by default for a few years now [1]. Previously it was disabled by default because the OpenSSL license is not compatible with the GPL license [2]. This was resolved when they switched to the Apache License, Version 2.0 in OpenSSL 3.0.0.
If the Void people wanted to enable OpenSSL, they would probably just need to add openssl (or whatever they package it as) to the package dependencies.
Cheers; I guess I should have checked the coreutils implementation; I kind of just assumed it has one implementation instead of being a compile option :embarrassed-emoji:
I also have an Arch machine where it does link to libcrypto, and it seems roughly identical (or close enough that I don't care, this is a live server doing tons of $stuff so has big error bars):
md5sum 1.58s user 0.31s system 98% cpu 1.908 total
~/verify -a md5 1.59s user 0.13s system 99% cpu 1.719 total
sha256sum 0.71s user 0.12s system 99% cpu 0.840 total
~/verify -a sha256 0.74s user 0.12s system 99% cpu 0.862 total
Still wish it could do multi-core though; one reason I looked in to this is because I wanted to check 400G of files and had 15 cores doing nothing (I know GNU parallel exists, but I find it hard to use and am never quite sure I'm using it correctly, so it's faster to write my own little Go program – especially for verifying files).
Interesting, there must be something wrong here. Here is a benchmark using the same commit and default options other than adjusting '--with-openssl=[yes|no]':
$ dd if=/dev/random of=input bs=1000 count=$(($(echo 10G | numfmt --from=iec) / 1000))
10737418+0 records in
10737418+0 records out
10737418000 bytes (11 GB, 10 GiB) copied, 86.3693 s, 124 MB/s
$ time ./src/sha256sum-libcrypto input
b3e702bb55a109bc73d7ce03c6b4d260c8f2b7f404c8979480c68bc704b64255 input
real 0m16.022s
$ time ./src/sha256sum-nolibcrypto input
b3e702bb55a109bc73d7ce03c6b4d260c8f2b7f404c8979480c68bc704b64255 input
real 0m39.339s
Perhaps there is something wrong with the detection on your system? As in, you do not have this at the end of './configure':
Sorry, I meant "roughly identical [to my Go program]", not "roughly identical [to the version without OpenSSL]". The ~/verify binary is my little Go program that is ~4 times faster on my Void system, but is of roughly equal performance on the Arch system, to check that coreutils is not slower than Go (when using OpenSSL). Sorry, I probably didn't make that too clear.
Ah, I should have guessed from the program name. :)
I thought I remember Go having pretty optimized assembly for crypto routines. But I have not used the language much. If you have your source code uploaded somewhere I'd be interested to have a look to see what I am missing out on.
That is not a quote from that post. I am very much not pedantic about only using quotation marks for quotes as long as it reasonably accurately gets the gist right, but in this case it very much doesn't.
You are leaving out the qualified language of "generally", which completely changes what was said. And worse, the post explicitly acknowledges that it doesn't solve all bugs in the next sentence.
And even if you can dig deep and find someone using unqualified language somewhere, I'm willing to bet a lot of money that this is an oversight and when pressed they will immediately admit so (on account of this being an internet forum and not a scientific paper, and people are careless sometimes). "I like coffee" rarely means "I always like coffee, all the time, without exception".
That’s a far more nuanced comment than you’re portraying it as, especially as it’s appearing for exactly this scenario: the new dd is working as designed, it’s not segfaulting or corrupting data, but its design isn’t identical to the GNU version and that logic error is the kind of thing Rust can’t prevent short of AGI.
Writing bugs in Rust is trivial and happens all the time. "do_stuff(sysv[1], sysv[2])" is a bug if you reversed the sysv arguments by accident. You can easily create a more complex version of that with a few "if" conditionals and interaction with other flags.
There are many such silly things people can – and do – trivially get wrong all the time. Most bugs I've written are of the "I'm a bloody idiot"-type. The only way to have a fool-proof compiler is to know intent.
What people may say is something like "if it compiles, then it runs", which is generally true, but doesn't mean it does the right thing (i.e. is free of bugs).
Consider that “works without crashing” and “works the way I had in mind” are not the same thing. Rust makes it easier to avoid logic bugs but if you think bs= should do X and there should have been a spec saying to do Y, it’s not something a language can prevent.