We are excited about the Tinker API, it exposes the primitives forward_backward, optim_step, sample and checkpoints of an LLM as a REST API, which can be used to implement pretty much arbitrary training recipes while hiding all the infra challenges of running the model and also abstract the underlying accelerator with very small surface area. We hope it can emerge as a standard. If you have any feedback for our open-source implementation we would love to hear about it!
The project is indeed driven by the authors listed on the paper and also the knowledge and experience that was accumulated in the AMPLab (the predecessor of the RISELab, see https://amplab.cs.berkeley.edu/). If you look at the github history, we've been working on it for longer than a year and had various prototypes before that, so it doesn't come out of "thin air" ;)
The lab's sponsors are also helpful, some of them have been experimenting with the system internally and giving us feedback.
Thank you for responding. I indeed spent time/effort in way of my crafted reply because it caught me eye. I haven't fully parsed the paper but covered a number of pages that colored the nature of my inquiry. I in no way intended to take anything away from the author of the paper but wanted to get at what you yourself declared :
"was accumulated in the AMPLab (the predecessor of the RISELab, see https://amplab.cs.berkeley.edu/)" as The backstory behind the paper as I clearly surmised there was one its historical nature. I also wanted to understand how long this was being worked due to the nature of the language used in the paper and how the concepts and language familiarly fit in with other things I've seen. And this right here :
"The lab's sponsors are also helpful, some of them have been experimenting with the system internally and giving us feedback." Yes, I understand the nature of this is moreso for corporate use cases than it is for academic and furthering therein. I was in search of names but already have a number of them I can surmise and a handful more that I will derive. I think its interesting what is being done here but there were choice words that were stated in the paper that limit it. At this juncture and time in the state of AI development, I will reserve any other commentary beyond stating that there are an incredible amount of fundamental limits in approaching things this way that fall on deaf ears due to the shut off nature/sponsorship of such developments. I wish you guys the best and am sure there will be traction as it relates to RL.
For Ray, the main use case at the moment is parallel/distributing machine learning algorithms, people have been using it for parallelizing MC(MC) style applications, doing hyperparameter search, (pre-)process data, we are using it for reinforcement learning (and have a library for that, see http://ray.readthedocs.io/en/latest/rllib.html)
More broadly it is useful for many parallel/distributed Python applications where low latency (~1ms) and high throughput of tasks are a requirement.
Python is almost synonymous with shitty performance in my mind; am I just wrong about typical python performance, or are you doing something special to make this of a non-issue (e.g. the way numpy essentially shoves all the work into C), or is the flexibility Ray affords worth the performance penalty for your users?
(1) Python single threaded performance: Here, most of the libraries we are using are implemented in C++ (like numpy, TensorFlow, Cython to speed up the code, etc.). Ray is orthogonal to that.
(2) Python parallel performance: Here Python is mostly problematic because of its lack of support for threading (the GIL is one problem here); we handle this problem by using multiple processes and shared memory throughout. Efficient serialization makes this feasible.
The core of Ray is implemented in C++, so performance is not an issue for that; also all of the serialization is implemented in C++.
Author here. For CapnProto see the answer to the other comment above.
Concerning dill, we have been using it for serializing function and classes (and then switched to cloudpickle because it supports some Python functionality better and the community around it is very active responsive); cloudpickle/dill are great in that they support a very wide variety of Python objects, especially concerning code (functions, lambdas, classes); it is less ideal for large data, because there is no zero copy mechanism, the format is not standardized and serialization/deserialization can be slow, sometimes it is slower than pickle. We do fall back to cloudpickle for objects we don't support like Python lambdas. We also use it to serialize class definitions, so the data associated to classes is serialized using the solution presented above and the code/methods are serialized using cloudpickle. This combines the advantages of both solutions.
Author here. From our perspective, CapnProto has similar characteristics as Flatbuffers and the reasons to prefer Arrow over it are the same: We would need to develop a mapping from Python types to CapnProto from scratch and Arrow has many facilities that are useful for us already built (Tensor serialization, code to deal with some Python types like datetimes, zero copy DataFrames, a larger ecosystem for interfacing with other formats like parquet, reading from HDFS, etc.). And it is designed for Big Data. So Arrow was a very natural choice (and it also supports Windows). Wes is doing some amazing work here!
It looks like Arrow utilizes FlatBuffers internally [1]. Seems like using the Arrow project builds a lot of scaffolding that'd otherwise need to be built for this particular use case.