I am fascinated by this and similar research (RotorQuant, etc). It seem by next year we will be able to run this year's largest models on last year's hardware. :)
Maybe we won't need as many data centers and as much power as we thought. Maybe we can run more powerful models locally.
Just look at deepseek V4, this preview model uses only 8 GB for 1M token KV cache(the context). It's insanely efficient already. It's just that most models that are coming out are barely catching up with technical breakthroughs.
Deepseek are pioneers.
Unfortunately V4 is not trained for most real world usage, it is mainly for world general knowledge.
I thought the principal consequence of these KV cache optimisations was letting you run more simultaneous inferences on the same model with the same memory. It doesn’t let you store more model. In some sense that puts local LLM usage at a further disadvantage to inference done in a hyperscaler’s data center.
The size of the KV cache (context stored) is proportional to the number of layers of the model and number of "hidden dimensions". For a 400B model it could be 30-60GB for just an 8K context window (depends on the model, etc, just a ballpark).
So shrinking that by 6x (from fp16), would be big win for larger models.
True, while TurboQuant can also be applied to model weights, it won't save size over q4 compression, but will have better accuracy.
That's my hope as well as I tend to use low end GPUs (e.g. NVIDIA GeForce RTX 2060 @ 6GB). Been looking for an image generation model that can fit that vid card, for use with Ollama + GUI in Linux. No luck yet, since money's tight and jobs are tighter :(
An Arc B580 will just about fit Flux.2 Klein (At FP8). However, you can also easily get much larger GPUs on RunPod or Vast at $0.25/hr.
I would strongly recommend exploring that option, renting an RTX 5090 for an evening of image generation for a dollar or two is way more fun then trying to jam big models on little cards. Just take some time to create a reasonable, scripted, deployment workflow for when you create a fresh instance.
We're only a few years into this new tech getting serious research manhours thrown at it. Already some incredible optimizations have been found in a short amount of time. Not only has the efficiency of inference been increasing dramatically, the quality of tiny models has been significantly improving.
Hmm... Sure, if you do not need a database then do not use a database.
Don't use a sports-car to haul furniture or a garbage truck as an ambulance.
For the use case and scale mentioned in the article it's obvious not to use a database.
Am I missing something? I guess many people are the using the tools they are familiar with and rarely question whether they are really applicable. Is that the message?
I think a more interesting question is whether you will need a single source of truth. If you don't you can scale on many small data sets without a database.
I will say this before I shut up with my rant: If you start with a design that scales you will have an easier to scale when it is time without re-engineering your stack. Whether you think you will need to scale depends on your projected growth and the nature of your problem (do you need a single source of truth, etc.)
Open Source was always open to "many eyes" in theory exposing itself to zero-day vulnerabilities. But the "many eyes" go for the good and the bad actors.
As far as I am concerned... Way to go Cal.com, and a good reminder to never use your services.
Good. Now leave TikTok and Facebook as well. People who care will find out what you are up to, and people who don't won't see you on social media anyway.
I left Twitter, Facebook, et al about a decade ago. And I can assure you: You will never miss any important development.
The notion that we need to plugged into Twitter, X, whatever, to stay up to date is simply false.
Personally I don’t use it for anything I can find pretty much everywhere else as well, but there are still a few people whose posts I consider interesting that only post on X.
How much many more wars over gas or oil do we need to finally just take the energy that (for the most part) is available locally and renewable?!
The petrol era is coming to an end. Our current administration might desperately want to remain a petrol state (for reasons that escape me), but it will only delay the inevitable. The EU is not much better either. The writing has been on the wall, and even since the Russian invasion into Ukraine not much has happened.
What is going on? Are we all insane, or is it just intense lobbying of yesterday's petrol industry?
EVs were on track to being mainstream 20 years ago. See the other story about the 90s GM EV1 and associated documentary. All the technology was there in 1999, but every EV in development simultaneously shut down once they started becoming usable.
We're not going to pretend that energy density in batteries was anywhere near ready for prime time in 1999. Maybe niche around town cars owned by those willing to sacrifice to have something cool/unique.
The EV1 was a $70k car (2026 dollars) with economy car size and fittings, with a 75mi range and 3 hour charge time. There is no conspiracy, the tech wasn't there yet.
the share of renewables of the EU at ~55% of net energy generation is almost twice as high as China's or America's, only Latin America fairs better. Germany essentially front ran this industry 20 years ago. Although as usual it turns out better to be second than first a bit of credit here please.
Seems current admin wants less choice. No imports of EVs from countries that do it better and no support for local EV makers.
As for tax credits, sure, push policy requires spending. But then this admin has spent approximately 100B trying to reduce spending and instead increased spending, so this seems like penny-wise pound foolish
Citation needed that "the people" wanted to end tax credits on EVs. EVs were steadily increasing in popularity and capturing a growing market. In a very real sense consumers had "choice" before, in that the market supports made EVs accessible.
That's not entirely unreasonable. As long as there is a way to enable this in perpetuity for my device(s) and it works for all Android devices it's a compromise I could live with.
Again, can we, please, stop call it side-loading. I'm not sliding in anything "from the side" on the sly, I am simply installing an app of my choice on my damn phone.
Supply-chain risks means "the potential for adversaries to sabotage, subvert, or disrupt the integrity and delivery of defense systems, including software, hardware, and services, to degrade national security".
So now Anthropic is an adversary, because it does not want "fully autonomous weapons" or automated mass surveillance? Sure thing, DoD. Go use Grok or whatever, I'm sure that will go great.
In that sense, at least for me, it was a third place where we could roam to get inspired and connect. We lost that. I was in Akihabara last weekend. And its the same in a way. While there are still a few, most tech stores are now phone/laptop stores that don't sell parts. Making the hunt for tech really boring.
There are a few stores left that sell parts in Akihabara, but only a few and they're not that easy to find. Akihabara now is mostly a place to go to maid cafes.
It’s pretty much for the same reasons. All those stores and types of stores that used to be in SF, Cambridge, Tokyo are all found in Shenzen now. That’s where the critical mass is.
In fact we should not even call it "sideloading", as if we are sneaking anything in "from the side". It is simply installing something I like on a device that I own.
My device can warn me about security consequences and let me be the one who decides what to do (with my device).
With 23,623 (as of today) signatures I doubt anybody really cares, and we'd all rather be cheeple doing the tech companies' bidding as long as we can flop on our couches and consume.
Clearly Google wants to make money off their monopoly (created in part from initial openness) and they are disguising it as some security/safety enhancement bullsh*t. Shameful!
My main question: I chose Android over Apple because of the extra freedoms it affords me. When that goes away, what reason do I have continuing with Android?
Maybe we won't need as many data centers and as much power as we thought. Maybe we can run more powerful models locally.
reply