I switched to Gemini with my new phone and I literally couldn't tell a difference. It is actually crazy how small the cost of switching is for LLMs. It feels like AI is more like a commodity than a service.
> I switched to Gemini with my new phone and I literally couldn't tell a difference. It is actually crazy how small the cost of switching is for LLMs. It feels like AI is more like a commodity than a service.
It is. It's wild to me that all these VCs pouring money into AI companies don't know what a value-chain is.
Tokens are the bottom of the value-chain; it's where the lowest margins exist because the product at that level is a widely available commodity.
On top of that, the on-device models have got stronger and stronger as the base models + RL has got better. You can do on your laptop now what 2 years ago was state of the art.
Which dimensions do you see Google lagging on? They seem broadly comparable on the usual leaderboard (https://lmarena.ai/leaderboard) and anecdotally I can't tell the difference in quality.
I tend personally to stick with ChatGPT most of the time, but only because I prefer the "tone" of the thing somehow. If you forced me to move to Gemini tomorrow I wouldn't be particularly upset.
> Which dimensions do you see Google lagging on? They seem broadly comparable on the usual leaderboard (https://lmarena.ai/leaderboard) and anecdotally I can't tell the difference in quality.
Gemini holds indeed the top spot, but I feel you framed your response quite well: they are all broadly comparable. The difference in the synthetic benchmark from the top spot and the 20th spot was something like 57 points on a scale of 0-1500
" in many dimensions they lag behind GPT-5 class " - such as?
Outside of computer, "the moat" is also data to train on. That's an even wider moat. Now, google has all the data. Data no one else has or ever will have. If anything, I'd expect them to outclass everyone by a fat margin. I think we're seeing that on video however.
Do you want to model the world accurately or not? That person is part of our authentic reality. The most sophisticated AI in the world will always include that person(s).
a bit weird to think about it since google has literally internet.zip in multiple versions over the years, all of email, all of usenet, all of the videos, all of the music, all of the user's search interest, ads, everything..
> a bit weird to think about it since google has literally internet.zip in multiple versions over the years, all of email, all of usenet, all of the videos, all of the music, all of the user's search interest, ads, everything..
Yeah, Google totally has a moat. Them saying that they have no moat doesn't magically make that moat go away.
They also own the entire vertical which none of the competitors do - all their competitors have to buy compute from someone who makes a profit just on compute (Nvidia, for example). Google owns the entire vertical, from silicon to end-user.
Given Apple’s moat is their devices, their particular spin on AI is very much edge focussed, which isn’t as spectacular as the current wave of cloud based LLM. Apple’s cloud stuff is laughably poor.
Depending on how you look at it I suppose but I believe Gemini surpasses OpenAI on many levels now. Better photo and video models. The leaderboard for text and embeddings are also putting Google on top of Openai.
gemini-2.5-pro is ranked number 1 in llmarena (https://lmarena.ai/leaderboard) before gpt-5-high. In the Text-to-Video and Image-to-video, google also have the highest places, OpenAI is nowhere.
Yes, but they're also slower. As LLMs start to be used for more general purpose things, they are becoming a productivity bottle-neck. If I get a mostly right answer in a few seconds that's much better than a perfect answer in 5 minutes.
Right now the delay for Google's AI coding assistant is high enough for humans to context switch and do something else while waiting. Particularly since one of the main features of AI code assistants is rapid iteration.
Meanwhile he allowed desktop development to become a mess, he was the one killing mobile, and now Microsoft is dependent on Google and Apple for mobile endpoints, other than laptops.
The Office Suite is doing better than ever. I have no problem paying $129 a year for O365 with five users and each user can use it across Macs, Windows, iPhone, iPad and web. The iPad version with a Bluetooth keyboard and mouse is actually pretty good.
From your computer you attach a USB drive to your computer and move files like any other drive.
From an iPhone or iPad, you connect a USB drive to it using the USB C port, the USB drive shows up in the Files along with your OneDrive storage location and you move files to your drive.
The same way you would with GDrive, iCloud Drive or Dropbox
... by becoming OSS friendly, and especially Linux friendly. Prior to that Microsoft and Azure were irrelevant because nobody wanted to run their backends on Windows.
Azure always had a very large and devoted following of corps who were all in on Windows, even in the early days when everything Azure ran on Windows. They had a very deep fanbase which caused them to not see things from the Silicon Valley perspective
Yes, the same nerds who would balk at using Windows would have balked at using Azure, but when it was time to choose clouds, that foot that Microsoft had in the door with corporate paid off big time. Many people have the privilege of working detached from the corporate world, but that also leads to warped perceptions like that of Paul in 2007.
Yep, I do think Paul was presciently observing a powerful class of people who would end up making the decisions in big companies that would not end up being all in MS customers, but I also remember being shocked seeing the demand for Azure when it first released (we wrote add-on software for cloud deployments for the big 3 clouds, and some smaller ones)
Solving the strawberry problem will probably require a model that just works with bytes of text. There have been a few attempts at building this [1] but it just does not work as well as models that consume pre-tokenized strings.
Or just a way to compel the model to do more work without needing to ask (isn't that what o1 is all about?). If you do ask for the extra effort it works fine.
+ How many "r"s are found in the word strawberry? Enumerate each character.
- The word "strawberry" contains 3 "r"s. Here's the enumeration of each character in the word:
-
- [omitted characters for brevity]
-
- The "r"s are in positions 3, 8, and 9.
I tried that with another model not that long ago and it didn't help. It listed the right letters, then turned "strawberry" into "strawbbery", and then listed two r's.
Even if these models did have a concept of the letters that make up their tokens, the problem still exists. We catch these mistakes and we can work around them by altering the question until they answer correctly because we can easily see how wrong the output is, but if we fix that particular problem, we don't know if these models are correct in the more complex use cases.
In scenarios where people use these models for actual useful work, we don't alter our queries to make sure we get the correct answer. If they can't answer the question when asked normally, the models can't be trusted.
I think o1 is a pretty big step in this direction, but the really tricky part is going to be to get models to figure out what they’re bad at and what they’re good at. They already know how to break problems into smaller steps, but they need to know what problems need to be broken up, and what kind of steps to break into.
One of the things that makes that problem interesting is that during training, “what the model is good at” is a moving target.
Perhaps. LLMs are trained to be as human-like as possible, and you most definitely need to know how the individual human you are asking works if you want a reliable answer. It stands to reason that you would need to understand how an LLM works as well.
The good news is that if you don't have that understanding, at least you'll laugh it off with "Boy, that LLM technology just isn't ready for prime time, is it?". In contrast to when you don't understand how the human works – that leads to, at very least name calling (e.g. "how can you be so stupid?!"), a grander fight, even all out war at the extreme end of the spectrum.
You're right in aspect that I need to know how humans work to ask them a question = if I were to ask my dad how many Rs are in strawberry, he would say "I don't have a clue" because he doesn't speak english. But he wouldn't hallucinate an answer - he would admit that he doesn't know what I'm asking him about. I gather that here LLM is convinced that the answer is 2, but that means that LLM are being trained to be alien, or at least, when I'm asking questions I need to be precise on what I'm asking about (which isn't any better). Or maybe humans also hallucinate 2, dependent on human
It seems your dad has more self-awareness than most.
A better example is right there on HN. 90% of the content found on this site is just silly back and forths around trying to figure out what each other is saying because the parties never took the time to stop and figure out how each other works to be able to tailor the communication to what is needed for the actors involved.
In fact, I suspect I'm doing that to you right now! But I didn't bother trying to understand how you work, so who knows?
It's interesting how all focus is now primarily on decoder-only next-token-prediction models. Encoders (BERT, encoder of T5) are still useful for generating embedding for tasks like retrieval or classification. While there is a lot of work on fine-tuning BERT and T5 for such tasks, it would be nice to see more research on better pre-training architectures for embedding use cases.
I believe RWKV is actually an architecture that can be used for encoding: given a LSTM/GRU, you can simply take the last state as an encoding of your sequence. The same should be possible with RWKV, right?
Let's be fair and acknowledge the difference between a small local farm and the larger "industrial"-like ones. I grew up in a farm, I know others too, people are kind to their animals.
Of course, the larger ones don't care. The same happens even when producing vegetables, no regards to nature in those cases either.
For JIT-ing you need to know the sizes upfront. There was an experimental branch for introducing jagged tensors, but as far as I know, it has been abandoned.
Back when GST was introduced, they asked all exporters of software services to pay the 18% GST and then claim a refund because exports are exempted. This was before they introduced a process to apply for a "NOC" (which must be renewed periodically I think). So, I duly paid GST for the first couple of months.
5 years later, I'm yet to receive the refunds. First, they said that they were authorized to only sanction refunds for exporters of physical goods and then they said that this is handled by the State government, who said that it was handled by the Central. Finally Central accepted that they are in-charge of the refund, but by then COVID happened and I could not follow up for 2 years. In May, when I tried to claim the refund again, they said that this refund pertains to 2016 and it's difficult to "reopen" the accounts for that year and that's handled by yet another department. So, they have written to them to ask for the process. No replies after two emails. Story continues...
My family’s liquor shop was forced to close because our accountants could not keep up with the constant GST changes. They quit and we could not find another on short notice. The worst part was that the shop was on the hook for the unpaid taxes of upstream suppliers - money that is yet to be fully refunded after years.
India’s legal systems and the enforcement of laws are one of the biggest drags on a country with immense potential.
Yes, you can get a waiver if you apply for something called the "Letter of Undertaking" which allows you to not pay anything for exports. This was not available at the start.