Hacker Newsnew | past | comments | ask | show | jobs | submit | more mtklein's commentslogin

I completely agree that technology in the last couple years has genuinely been fulfilling the promise established in my childhood sci-fi.

The other day, alone in a city I'd never been to before, I snapped a photo of a bistro's daily specials hand-written on a blackboard in Chinese, copied the text right out of the photo, translated it into English, learned how to pronounce the menu item I wanted, and ordered some dinner.

Two years ago this story would have been: notice the special board, realize I don't quite understand all the characters well enough to choose or order, and turn wistfully to the menu to hopefully find something familiar instead. Or skip the bistro and grab a pre-packaged sandwich at a convenience store.


> I snapped a photo of a bistro's daily specials hand-written on a blackboard in Chinese, copied the text right out of the photo, translated it into English, learned how to pronounce the menu item I wanted, and ordered some dinner.

> Two years ago

This functionality was available in 2014, on either an iPhone or android. I ordered specials in Taipei way before Covid. Here's the blog post celebrating it:

https://blog.google/products/translate/one-billion-installs/

This is all a post about AI, hype, and skepticism. In my childhood sci-fi, the idea of people working multiple jobs to still not be able to afford rent was written as shocking or seen as dystopian. All this incredible technology is a double edges sword, but doesn't solve the problems of the day, only the problems of business efficiency, which exacerbates the problems of the day.


The part of that google translate announcement that covered translating handwritten Chinese must have gone missing


It was available as early as 2012, probably earlier as IIRC Microsoft was copying:

https://www.pcworld.com/article/470008/bing_translator_app_g...


>The other day, alone in a city I'd never been to before, I snapped a photo of a bistro's daily specials hand-written on a blackboard in Chinese, copied the text right out of the photo, translated it into English, learned how to pronounce the menu item I wanted, and ordered some dinner.

To be fair apps dedicated apps like Pleco have supported things like this for 6+ years, but the spread of modern language models has made it more accessible


My preferred way to compare floats as being interchangeably equivalent in unit tests is

    bool equiv(float x, float y) {
        return (x <= y && y <= x)
            || (x != x && y != y);
    }
This handles things like ±0 and NaNs (while NaNs can't be IEEE-754-equal per se, they're almost always interchangeable), and convinces -Wfloat-equal you kinda know what you're doing. Also everything visually lines up real neat and tidy, which I find makes it easy to remember.

Outside unit tests... I haven't really encountered many places where float equality is actually what I want to test. It's usually some < or <= condition instead.


I have always thought that punning through a union was legal in C but UB in C++, and that punning through incompatible pointer casting was UB in both.

I am basing this entirely on memory and the wikipedia article on type punning. I welcome extremely pedantic feedback.


> punning through a union was legal in C

In C89, it was implementation-defined. In C99, it was made expressly legal, but it was erroneously included in the list of undefined behavior annex. From C11 on, the annex was fixed.

> but UB in C++

C++11 adopted "unrestricted unions", which added a concept of active members that is UB to access other members unless you make them active. Except active members rely on constructors and destructors, which primitive types don't have, so the standard isn't particularly clear on what happens here. The current consensus is that it's UB.

C++20 added std::bit_cast which is a much safer interface to type punning than unions.

> punning through incompatible pointer casting was UB in both

There is a general rule that accessing an object through an 'incompatible' lvalue is illegal in both languages. In general, changing the const or volatile qualifier on the object is legal, as is reading via a different signed or unsigned variant, and char pointers can read anything.


> In C99, it was made expressly legal, but it was erroneously included in the list of undefined behavior annex.

In C99, union type punning was put under Annex J.1, which is unspecified behavior, not undefined behavior. Unspecified behavior is basically implementation-defined behavior, except that the implementor is not required to document the behavior.


We can use UB to refer to both. :)


> We can use UB to refer to both. :)

You can, but in the context of the standard, you'd be wrong to do so. Undefined behavior and unspecified behavior have specific, different, meanings in context of the C and C++ standards.

Conflate them at your own peril.


> > We can use UB to refer to both. :)

> You can, but in the context of the standard, you'd be wrong to do so. Undefined behavior and unspecified behavior have specific, different, meanings in context of the C and C++ standards.

> Conflate them at your own peril.

I think that ryao was not conflating them, but literally just pointing out, as a joke, that "UB" can stand for "undefined behavior" or "unspecified behavior." Taking advantage of this is inviting dangerous ambiguity, which is why ryao's suggestion ended with ":)," but I think that saying that it's wrong is an overstateent.


Maybe, but we were talking about "undefined behavior," not "UB," so the point is moot.


The GCC developers disagree as of last December:

> Type punning via unions is undefined behavior in both c and c++.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141#c13


I think they're wrong about C.


Saw this recently and thought it was good: https://www.youtube.com/watch?v=NRV_bgN92DI


There has been plenty of misinformation spread on that. One of the GCC developers told me explicitly that type punning through a union was UB in C, but defined by GCC when I asked (after I had a bug report closed due to UB). I could find the bug report if I look for it, but I would rather not do the search.


From a draft of the C23 standard, this is what it has to say about union type punning:

> If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This might be a non-value representation.

In past standards, it said "trap representation" rather than "non-value representation," but in none of them did it say that union type punning was undefined behavior. If you have a PDF of any standard or draft standard, just doing a search for "type punning" should direct you to this footnote quickly.

So I'm going to say that if the GCC developer explicitly said that union type punning was undefined behavior in C, then they were wrong, because that's not what the C standard says.


Section J.1 _Unspecified_ behavior says

> (11) The values of bytes that correspond to union members other than the one last stored into (6.2.6.1).

So it's a little more constrained in the ramifications, but the outcomes may still be surprising. It's a bit unfortunate that "UB" aliases to both "Undefined behavior" and "Unspecified behavior" given they have subtly different definitions.

From section 4 we have:

> A program that is correct in all other aspects, operating on correct data, containing unspecified behavior shall be a correct program and act in accordance with 5.1.2.4.


Here is what was said:

> Type punning via unions is undefined behavior in both c and c++.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141#c13

Feel free to start a discussion on the GCC mailing list.


I actually might, although not now. Thanks for the link. I'm surprised he directly contradicted the C standard, rather than it just being a misunderstanding.


According to another comment, the C standard contradicts the C standard on this:

https://news.ycombinator.com/item?id=43794268

Taking snippets of the C standard out of context of the whole seems to result in misunderstandings on this.


It doesn't. That commenter is saying that in C99, it was unspecified behavior. Since C11 onward, it's been removed from the unspecified behavior annex and type punning is allowed, though it may generate a trap/non-value representation. It was never undefined behavior, which is different.

Edit: no, it's still in the unspecified behavior annex, that's my mistake. It's still not undefined, though.


Most of the C code I write is C99 code, so it is undefined behavior either way for me (if I care about compilers other than GCC and Clang).

That said, I am going to defer to the GCC developers on this since I do not have time to make sense of all versions of the C standard.


That's fair. In the end, what matters is how C is implemented in practice on the platforms your code targets, not what the C standard says.


Union type punning is allowed and supported by GCC: https://godbolt.org/z/vd7h6vf5q


I said that GCC defines type punning via unions. It is an extension to the C standard that GCC did.

That said, using “the code compiles in godbolt” as proof that it is not relying on what the standard specifies to be UB is fallacious.


I am a member of the standards committee and a GCC maintainer. The C standard supports union punning. (You are right though that relying on godbolt examples can be misleading.)



What is your point? I already said that GCC defines it even though the C standard does not. As per the GCC developers:

> Type punning via unions is undefined behavior in both c and c++.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141#c13


> One of the GCC developers told me explicitly that type punning through a union was UB in C, but defined by GCC when I asked

I just was citing the source of this for reference.


I see. Carry on then. :)


This was my instinct too, until I got this little tickle in the back of my head that maybe I remembered that Clang was already acting like this, so maybe it won't be so bad. Notice 32-bit wzr vs 64-bit xzr:

    $ cat union.c && clang -O1 -c union.c -o union.o && objdump -d union.o
    union foo {
        float  f;
        double d;
    };

    void create_f(union foo *u) {
        *u = (union foo){0};
    }

    void create_d(union foo *u) {
        *u = (union foo){.d=0};
    }

    union.o: file format mach-o arm64

    Disassembly of section __TEXT,__text:

    0000000000000000 <ltmp0>:
           0: b900001f      str wzr, [x0]
           4: d65f03c0      ret

    0000000000000008 <_create_d>:
           8: f900001f      str xzr, [x0]
           c: d65f03c0      ret


Ah, I can confirm what I see elsewhere in the thread, this is no longer true in Clang. That first clang was Apple Clang 17---who knows what version that actually is---and here is Clang 20:

    $ /opt/homebrew/opt/llvm/bin/clang-20 -O1 -c union.c -o union.o && objdump -d union.o

    union.o: file format mach-o arm64

    Disassembly of section __TEXT,__text:

    0000000000000000 <ltmp0>:
           0: f900001f      str xzr, [x0]
           4: d65f03c0      ret

    0000000000000008 <_create_d>:
           8: f900001f      str xzr, [x0]
           c: d65f03c0      ret


Looks like that change is clang ≤19 to clang 20: https://godbolt.org/z/7zrocxGaq


"The peer malus" - Frederic Beal


It hadn't yet been at the time this program was in practice. I wager the enthusiasm for nudging was probably around its peak at the time we're talking, somewhere early 2010s?


The idiom I am familiar with here is to make the constructor private and provide a public static factory method that allocates on the heap.


It can be fun to explore the interactions of unorm and float bit representations even when you have float instructions. E.g. if you bit-or a unorm8 into 0x47000000 (32768.0f) then subtract 32768.0f, you'll get a number very close to right, just a float multiply of (256/255.0f) away. Reordering the math so that the subtraction and multiply can become a single FMA is a fun homework exercise.

        union {
            int bits;
            float  f;
        } pun = {x}, scale = {0x47000000};
        pun.bits |= scale.bits;
        pun.f    -= scale.f;
        pun.f    *= (256/255.0f);
This basically amounts to a software implementation of int->float conversion instructions; sadly I have never found a spot where it's actually worth doing when you have those int->float instructions available already, even with the FMA as a single instruction.

It's also worth considering whether your application can handle approximate conversion. If you have a [0,255] unorm in x, x + (x>>7) or equivalently x + (x>0x7f) will round it to a [0,256] fixed-point value. Crucially, this rounding does handle 0x00 and 0xff inputs correctly. Once in fixed-point with a nice power-of-two divisor, you can play all sorts of tricks, either again making use of the bit representation of floats, using ARM fixed-point instructions, etc. If you've ever looked longingly at the pmulhrsw family of instructions, this is a ripe area to explore.


Looks a little more like std::bit_cast? (Of course, there's a large overlap between the two.)


TIL about bit_cast. You may be correct, checking out its docs.

The StackOverflow answer I just read comparing the two suggests bit_cast is a library function, transmute is an intrinsic. But transmute is const, and reinterpret_cast isn’t. So on some level it’s a mix of the two. Most important thing is that it’s closer to this kind of cast than a normal one.


I'd suggest working incrementally from areas of your existing strength. Tweak whatever code base you are most familiar with, starting with a tiny change, and see how the assembly changes. I use objdump -d and git diff --no-index for this all the time.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: