Hacker Newsnew | past | comments | ask | show | jobs | submit | jessfyi's commentslogin

Getting a compound incorrect is not an "unimportant" error (for example the difference between sodium nitrate & sodium nitrite is small but critical) and seeing "small but blatant" errors actively propagated is the entire reason why the record should be corrected. The only upside of these little artifacts like "vegetative electron microscopy" [0] is that it's a leading indicator that the entire paper and team deserve more scrutiny--as well as any of those whom cite it.

[0] https://www.sciencealert.com/a-strange-phrase-keeps-turning-...


I believe they meant that it's "unimportant" because (to use your example) sodium nitrate and sodium nitrite actually exist, whereas there's no element with the chemical symbol "Gr".


The error in the OP is a typo that could never seriously confuse anyone, as the element Gr does not exist.

An interesting perspective is Terry Tao's on local vs. global errors (https://terrytao.wordpress.com/advice-on-writing-papers/on-l...). A typo like this, even if propagated, is a local error which at worst makes it very annoying to Ctrl-F papers or do literature review. Local errors deserve to be corrected, but in practice their importance to science as a field is small.


Gonna be completely honest, if you want to draw in people with anything more than a casual interest in literature, your examples are an immediate turn-off. I suggest you spend time on subreddits of major genres + booktok + see what's trending on apps like Fable if you want insight into what books people outside of the silicon valley/techbro bubble consume and enjoy.


Will do!


lmarena/lmsys is beyond useless, looking at prior rankings of models vs formal benchmarks or testing for accuracy + correctness on batches of real world data. It's a bit like using a poll of Fox News to discern the opinions of every American; the audience voting is consistently found wanting. Not even getting into how easily a bad actor with means + motivation (in this "hypothetical" instance wanting to show that a certain model is capable of running the entire US government) can manipulate votes which has been brought up in the past (yes I'm aware of the lmsys publication on how they defend against attacks using cloudflare + recaptcha, there are ways around that.)


So you're saying that either A: users interacting with models can't objectively rate what responses seem better to humans, B: xAi as a newcomer has somehow managed to game the leaderboard better than all those other companies, or C: all those other companies are not doing it. By those standards every test ever devised for anything is beyond useless. But simply not having the model creator running the evaluation is already going a long way.


No I'm saying that some companies are doing it (OpenAI at the very least), the company in question has motive and capability to game the system (kudos to them for pushing the boundaries there), AND the userbases' rankings have been historically, statistically misaligned with data from evals (though flawed) and especially when it comes to testing for accuracy + precision on real world data (outside of their known or presumed dataset). Take a look at how well Qwen or Deepseek actually performed vs the counterparts that were out at the same time vs their corresponding rankings.

In the nicest way possible I'm saying this form of preference testing is ultimately useless, primarily due to a base of dilettantes with more free time than knowledge parading around as subject matter experts and secondarily due to presumed malfeasance. The latter is more apparent to more of the masses (that don't blindly believe any leaderboard they see) now that access to the model itself is more widespread and people are seeing the performance doesn't match the "revolution" promised [0]. If you're still confused why selecting a model based on a glorified Hot or Not application is flawed, perhaps ask yourself why other evals exist in the first place (hint: some tests are harder than others.)

[0](One such instance of someone competent testing it and realizing it's not even close to the "best" model out) https://www.youtube.com/watch?v=WVpaBTqm-Zo


At work, developed our own suite of benchmarks. Every company with a serious investment in AI-powered platforms needs to do the same. Comparing our results to the Arena turns up some pleasant surprises, like DBRX hitting way above its weight for some reason.


You say no, but then go on and explain why you believe a combination of both option A and option B. That's fine I guess, I just don't consider it particularly likely given the currently available information.


The cases of NAION observed post-Ozempic usage in Denmark is 150 (up from 60-75) out of 424,152 patients, for a rare ailment that already affects patients specifically with diabetes. Sorry to say those taking it as a "shortcut" in your words are even less susceptible.

As someone who's been fortunate enough to be fit and able to work out their entire life, not sure how there are people like you who shun and shame those trying to gain a semblance of control over their weight in a world where it does have a real impact whether they get serious medical attention or not. Your likely skewed thoughts on vanity be damned, bigger people are treated worse across the board and GLP-1 is a genuine salve.


Any part of "saving" Intel should include a mechanism barring them from putting any more money that should be spent on R&D towards stock buybacks ($152B since 1990 as of September.) That said quoting the former Intel CEO (who still owns 3,245,986 shares) as "[one of the] expert[s] who says breaking up Intel won't do any good" seems like journalist malpractice--and makes me all the more certain it should be subsumed by a company with executives hungry to actually win again.


Intel stopped buybacks a few years ago, and are now stopping dividends.


Most companies should be banned from doing this.


Buybacks are just dividends with better tax implications that somehow make people angry.

Companies historically are expected to pay dividends, at least when their business is doing well. Business at Intel was doing well for most of 1990-2017. There was some time after the Pentium 4 stopped scaling before the Pentium 4M offered a recovery, and the Itanium mess; but overall pretty good until 2017.


Buybacks aren't exactly like dividends because they directly affect the pricing of the stock by interfering with the supply and demand. That said I think people are mostly angry about the conflict of interest where CEOs that have current and future shares are making decisions by what will maximize their personal returns rather than what's best for the company and shareholders.

When a company with growth prospects does well it should invest those $$$'s into things like R&D and expansion. Companies that pay their profit as dividend are generally not expected to grow as much and their stock prices (P/E) tends to reflect that.

That said the taxation aspect is maybe a problem and should be addressed if it's not working as intended.


> Buybacks are just dividends with better tax implications that somehow make people angry.

Yeah, buybacks are the new boogieman. Let me decide when to take the tax hit on an investment and not be forced.


Ok so I prefer dividends as I think that they encourage better behavior on the part of both investors and companies.

I think that buybacks definitely create a massive conflict of interest for C levels remunerated based on share price or EPS, and I dislike that I must sell to realize any gains. But perhaps this is a niche position .


What is wrong with giving the profit to shareholders? That is why the shareholders bought the shares.


It creates a massive conflict of interest for the company executives and favors short-term profits vs long-term sustainability of the company.


>that should be spent on R&D towards stock buybacks ($152B since 1990 as of September.)

Starting from 1990 seems like a weird starting point, because it includes much of Intel's heyday when their profits were arguably well deserved. Is the implication that every business shouldn't have profits and should plow every cent back to R&D?


I used that period (vs saying they spent $110B between 2005-2021) to establish the fact that it's a known, expected pattern of behavior regardless of Intel's performance, roadmap, or market conditions to lead the reader to recognize that if bailed out they'll likely continue in the near future instead of utilizing that money for its intended purpose.

Instead of assuming my comment is a generalized view on how businesses should operate as whole (and not the subject of the piece), perhaps take a moment to consider how the magnitude of buybacks--in the face of stiff competition, that have now leapfrogged them--is directly correlated to the mismanagement and dysfunction within Intel that leaves them unable to rise to the challenge the country demands.


Most companies do stock buybacks as a way to pay out bonuses to employees and execs with a lower tax rate. Since RSUs are taxed lower (I think), companies pay employees with those. But those grants are given by creating new shares. To not dilute the value of these shares, the company needs to keep buying back shares.


RSUs are treated as cash income at vest and taxed at same rates.

Stock buybacks benefit general shareholders (i.e. beyond employees) since they push up stock value without causing a taxable event. The alternative is dividends which are immediately taxed.


The conclusions reached in the paper and the headline differ significantly. Not sure why you took a line from the abstract when even further down it notes that it's that some elements of "truthfulness" are encoded and that "truth" as a concept is multifaceted. Further noted is that LLMs can encode the correct answer and consistently output the incorrect one, with strategies mentioned in the text to potentially reconcile the two, but as of yet no real concrete solution.


If something that sells 100 million+ devices isn't "super popular", I don't know what is. And not even counting the millions of TVs that have it built-in (Hi-Sense, TCL, Samsung) the brand is pretty ubiquitous.


The brand has been "Google Cast" for a long time, though. None of the TVs with this stuff built in have mentioned "Chromecast" in a very long time.


I was being generous and said "not even counting," but no despite the internal name change, most still maintain the "Chromecast Built-In" designation on their branding and sites which takes a mere second to Google and see.


The phones were released in October, while Gemini Nano's announcement happened in December. I, like other developers and consumers reaching for the smaller version, might've bought the device for the ability to run the ML features advertised in their keynote/based on the research they released the week prior to that (in the case of the former.)

During Gemini's initial release the language surrounding nano was that it was only the Pro initially, and I was happy to wait. The complete inability to run it, when the new Samsung phones can (including the model with 8GB as reported above) feels not only like a bait-and-switch/false-advertising, but a constraint based solely on driving sales. It does demand a clear explanation.

I care less about another potential Pixel class action, and more that I have to get another phone to test and deploy my apps to a smaller audience to.


The two phones might have similar or even identical memory chips, but that doesn't mean the carving of the address space is the same. A more meaningful comparison would be looking at how much of those 8GB are pinned by the various components (kernel, graphic buffers, sound, camera, telephony, radios, etc.), how much is left to user and system apps, how the system is tuned for active/background processes. 8GB is just a single data point, too simplistic to draw any even remotely plausible conclusions.


The point is excluding the ram difference, the hardware on the devices is the same. They launched at the exact same time.


We’re talking about LLMs. In that context it makes no sense to “exclude the ram difference.”


The same model utilizing the same amount of ram the Pixel 8 has runs on Samsung's latest phone.


> excluding the ram difference

Well, you can't really exclude the RAM difference, not for an LLM.


Not surprised this was flagged despite a civil discussion and how relevant it is to this late stage limbo social networks currently occupy. To be frank, it runs counter to the narrative that the dedicated cohort of "free speech" absolutists here whom don't want an example of why there absolutely need to be limits to what can be posted and disseminated.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: