It seems that self-distillation is the way to go for LLM.
Self-distillation has been shown recently as very efficient and effective back in January this year by MIT and ETH team in their Self-Distillation Fine-Tuning (SDFT) LLM system [1],[2].
This paper is also their closest competitor named On-Policy Self-Distillation in the comparison table.
I hope they keep the original work real name that is Self-Distillation Fine-Tuning or SDFT. Imagine later paper citing this very paper as cross-entropy self-distillation instead of their very own given name Simple Self-Distillation or SSD. Although I'd have admitted it's a lousy name that breaks the namespace with common SSD nomenclature for solid-dtate drive, as others have rightly pointed.
I think they should given the proper credit to this earlier seminal earlier on SDFT but apparently they just put it as one as of the systems in their benchmark but not explaining much of the connection and lineage which is a big thing in research publication.
>The first of these is its locked-in ecosystem, which keeps its users buying Apple.
Personally this is why I wouldn't touch Apple's product with a ten foot pole.
Anyway kudos to them their vision (read Steve's visions) and tenacity have put them at the upper echelon of consumer's tech companies. To make products that are desirable to people is very hard. That's probably why according to the article, the second most important things about Apple:
>Another major factor is its marketing, which has made it the only luxury technology brand.
Fun facts, TK Solver or TK!Solver original developer is Milos Konopaseka a textile engineer from from Czechoslovakia.
TK Solver is a software cousin of the infamous VisiCalc, developed by the same company Software Arts.
VisiCalc has been discontinued but TK Solver is still being sold today by Universal Technical Systems (UTS) [1].
Milos also developed the Question Answering System (QAS) running on a PDP-10. It operates on equations relating input yarn, cloth area, fiber strengths, etc. For a desired cloth strength, you could solve for fiber strength, or given fiber strength, you could solve for the cloth strength. The same operations you can still perform in TK Solver.
[1] Comprehensive Mathematical Software Tool for Engineers:
There're human-to-human (H2H), human-to-machine (H2M) or vice versa, and machine-to-machine (M2M) data communication.
If you perform simple extrapolation, the M2M data only surpass the others around 2029.
Coincidently, in the original timeline of Transformer movie, 2029 is the year that the Resistance, led by John Connor, destroyed Skynet and ended the war against the machines.
> Coincidently, in the original timeline of Transformer movie, 2029 is the year that the Resistance, led by John Connor, destroyed Skynet and ended the war against the machines.
I’d love to see that crossover Terminator and Transformers movie. Optimus Prime vs T-800 anyone?
I've got the strong feeling that AI model and agent requires different operating system (OS) paradigm that's data centric rather than file-system for more efficient, effective and trustworthy operations. This new OS should work seamlessly with data natively across different processors for examples CPU, GPU, TPU, NPU, accelarators, etc.
For working example, please check TabulaROSA (Tabular Operating System Architecture) proposed by the MIT team. Instead of normal OS system call, it utilizes data based operations with D4M that can work mathematically via associative array with structured or non-structered data [1],[2].
With the advent of new CPU acceleration with fully homomorphic encryption as demonstrated by Intel, the AI model and agent can even analyze the data without even decrypting them [3],[4].
[1] TabulaROSA: Tabular Operating System Architecture for Massively Parallel Heterogeneous Compute Engines
>As an example, North America is a huge area to make a book for!
I think AI/LLM with RAG is the ideal solution for this mushroom hunting/foraging.
With mobile phone fine-tuning multi-modal app capability and NTN satellite connection to the cloud, you're loaded for the bear, err mushrooms.
Check this video on Golden Chanterelle mushrooms hunting in Santa Barbara, Southern California where one lb can cost around USD20-30. The guys managed to gathered around 80lb for the trip or about USD1600 minimum if sold [1].
[1] Mushroom Hunting Catch and Cook (80lbs Found!):
Self-distillation has been shown recently as very efficient and effective back in January this year by MIT and ETH team in their Self-Distillation Fine-Tuning (SDFT) LLM system [1],[2].
This paper is also their closest competitor named On-Policy Self-Distillation in the comparison table.
I hope they keep the original work real name that is Self-Distillation Fine-Tuning or SDFT. Imagine later paper citing this very paper as cross-entropy self-distillation instead of their very own given name Simple Self-Distillation or SSD. Although I'd have admitted it's a lousy name that breaks the namespace with common SSD nomenclature for solid-dtate drive, as others have rightly pointed.
I think they should given the proper credit to this earlier seminal earlier on SDFT but apparently they just put it as one as of the systems in their benchmark but not explaining much of the connection and lineage which is a big thing in research publication.
[1] Self-Distillation Enables Continual Learning:
https://arxiv.org/abs/2601.19897
[2] Self-Distillation Enables Continual Learning:
https://self-distillation.github.io/SDFT.html
reply