> The panel is pretty weak on correlation but it's quite clearly also not the only thing that supports that particular claim neither does it contradict it.
It very clearly contradicts it: There is no correlation between the predicted truth value and the actual truth value. That is the essence of the claim. If you had read and understood the paper you would be able to specifically detail why that isn't so rather than say vaguely that "it is not the only thing that supports that particular claim".
To be fair, I'm not sure people writing papers understand what they're writing either. Much of the ML community has seemed to fully embraced "black box" nature rather than seeing it as something to overcome. I routinely hear both readers and writers tout that you don't need much math. But yet mistakes and misunderstand are commonplace and they're right, they don't need much math. How much do you need to understand the difference between entropy and perplexity? Is that more or less than what's required to know the difference between probability and likelihood? I would hope we could at least get to a level where we understand the linear nature of PCA
I'm not so sure that's the reason. I'm in the field, and trust me, I'm VERY frustrated[0]. But isn't the saying to not attribute to malice what can be attributed to stupidity? I think the problem is that they're blinded by the hype but don't use the passion to drive understanding more deeply. It's a belief that the black box can't be opened, no why bother?
I think it comes from the ad hoc nature of evaluation in young fields. It's like you need an elephant but obviously you can't afford one, so you put a dog in an elephant costume and can it an elephant, just to get in the right direction. It takes a long time to get that working and progress can still be made by upgrading the dog costume. But at some point people forgot that we need an elephant so everyone is focused on the intricacies of the costume and some will try dressing up the "elephant" as another animal. Eventually the dog costume isn't "good enough" and leads us in the wrong direction. I think that's where we are now.
I mean do we really think we can measure language with entropy? Fidelity and coherence with FID? We have no mathematical description of language, artistic value, aesthetics, and so on. The biggest improvement has been RLHF where we just use Justice Potter's metric: "I know it when I see it"
I don't think it's malice. I think it's just easy to lose sight of the original goal. ML certainly isn't the only one to have done this but it's also hard to bring rigor in and I think the hype makes it harder. Frankly I think we still aren't ready for a real elephant yet but I'd just be happy if we openly acknowledge the difference between a dog in a costume proxying as an elephant and an actually fucking elephant.
[0] seriously, how do we live in a world where I have to explain what covariance means to people publishing works on diffusion models and working for top companies or at top universities‽
>If you had read and understood the paper you would be able to specifically detail why that isn't so rather than say vaguely that "it is not the only thing that supports that particular claim".
Not every internet conversation need end in a big debate. You've been pretty rude and i'd just rather not bother.
You also seem to have a lot to say on how much people actually read papers but your first response also took like 5 minutes. I'm sorry but you can't say you've read even one of those in that time. Why would i engage with someone being intellectually dishonest?
> I guess i understand seeing as you couldn't have read the paper in the 5 minutes it took for your response.
You've posted the papers multiple times over the last few months, so no I did not read them in the last five minutes though you could in fact find both of the very basic problems I cited in that amount of time.
Because it's pointless to reply to a comment days after it was made or after engagement with the post has died down. All of this is a convenient misdirection for not having read and understood the papers you keep posting because you like the headlines.
> you can't say you've read even one of those in that time.
I'm not sure if you're aware, but most of those papers are well known. All the arxiv papers are from 2022 or 2023. So I think your 5 minutes is pretty far off. I for one have spent hours, but the majority of that was prior to this comment.
You're claiming intellectual dishonestly too soon.
That said, @foobarqux, I think you could expand on your point more to clarify. @og_kalu, focus on the topic and claims (even if not obvious) rather than the time
>I'm not sure if you're aware, but most of those papers are well known. All the arxiv papers are from 2022 or 2023. So I think your 5 minutes is pretty far off. I for one have spent hours, but the majority of that was prior to this comment.
You're claiming intellectual dishonestly too soon.
Fair Enough. With the "I'm not going to bother with the rest", it seemed like a now thing.
>focus on the topic and claims (even if not obvious) rather than the time
I should have just done that yes. 0 correlation is obviously false with how much denser the plot is at the extremes and depending on how many questions are in the test set, it could even be pretty strong.
> 0 correlation is obviously false with how much denser the plot is at the extremes and depending on how many questions are in the test set, it could even be pretty strong.
I took it as hyperbole. And honestly I don't find that plot or much of the paper convincing. Though I have a general frustration in that it seems many researchers (especially NLP) willfully do not look for data spoilage. I know they do deduplication but I do question how many try to vet this by manual inspection. Sure, you can't inspect everything, but we have statistics for that. And any inspection I've done leaves me very unconvinced that there is no spoilage. There's quite a lot in most datasets I've seen, which can have a huge change in the interpretation of results. After all, we're elephant fitting
I explicitly wrote "~0", and anyone who looks at that graph can say that there is no relationship at all in the data, except possibly at the extremes, where it doesn't matter that much (it "knows" sure things) and I'm not even sure of that. One of the reasons to plot data is so that this type of thing jumps out at you and you aren't misled by some statistic.
It very clearly contradicts it: There is no correlation between the predicted truth value and the actual truth value. That is the essence of the claim. If you had read and understood the paper you would be able to specifically detail why that isn't so rather than say vaguely that "it is not the only thing that supports that particular claim".