Nice and provocative read! Is it fair to restate the argument as follows? - New ...

Dorialexander · 2025-03-18T15:01:09 1742310069

Hi. Yes this is wholly correct.

On the second points:

* Well I'm very much involved in making open more models, pretrained the first model on free and open data without copyrigh issues, released the first version fo GRPO that can run on Google Colab (based on Will Brown). Yet, even then I have to be realistic: open source RL has a data issue. We don't have the action sequence data nor the recipes (emulators) that could make it possible to replicate even on a very small scale what big labs are currently working on.

* Agreed on this and I'm seeing this dynamic already in a few areas. Now it's still going to be uphill as some of the data can be bought and advanced pipelines can shortcut some of the need for it, as models can be trained directly on simulated environments.

evrydayhustling · 2025-03-18T17:57:52 1742320672

Thanks for the reply - and for the open AI work!

> We don't have the action sequence data nor the recipes (emulators) that could make it possible to replicate even on a very small scale what big labs are currently working on.

Sounds like an interesting opportunity for application-layer incumbents that want to enable OSS model advancement...

ankit219 · 2025-03-18T15:42:03 1742312523

answering the first question if i understand it correctly.

The missing piece is data obviously. With search and code, it's easier to get the data so you get such specialized products. What is likely to happen is: 1/ Many large companies work with some early design partners to develop solutions. They have the data + subject matter expertise, and the design partners bring in the skill. This way we see a new wave of RL agent startups grow. My guess is that this engagement would look different compared to a typical saas engagement. Some companies might do it inhouse, some wont because maintaining such systems is a task. 2/ These companies open source part of their dataset which can be consumed by oss devs to create better agents. This is more common in tech where a path to monopoly is to commoditize the immediately previous layer. Might play out elsewhere too, though I do not have a high degree of confidence here.

mannymanman · 2025-03-18T16:51:58 1742316718

Why will application layer value props be squeezed out? And if so, where does value accrue going forward in an RL first world?