Would you accept the argument that compiling is modifying the bytes in the memory space reserved for an executable?
I can edit the executable at the byte level if I so desire, and this is also what compilers do, but the developer would instead be modifying the source code to make changes to the program and then feed that through a compiler.
Similarly, I can edit the weights of a neural network myself (using any tool I want) but the developers of the network would be altering the training dataset and the training code to make changes instead.
The big difference that an Open Source license gives me is that regardless of the tool I use to make the edits, if I rewrite the bytes of the Linux kernel, I can freely release my version with the same license, but if I rewrite the bytes of Super Mario Odyssey and try to release the modified version, I'll soon be having a very fun time at the bankruptcy court.
You can, should you wish, totally release a model after iitialisation. It would be a useless model, but, again, the license does not deal with that. You would have the rights to run, modify and release the model, even if it were a random model.
tl;dr; Licenses deal with what you can do with a model. You can run it, modify it, redistribute it. They do not deal with how you modify them (i.e. what data you use to arrive at the "optimal" hardcoded values). See also my other reply with a simplified code example.
I suppose 80% means you don't give them a 0 mark because the software says it's AI, you only do so if you have other evidence reinforcing the possibility.
For the first SAM model, you needed to encode the input image which took about 2 seconds (on a consumer GPU), but then any detection you did on the image was on the order of milliseconds. The blog post doesn't seem too clear on this, but I'm assuming the 30ms is for the encoder+100 runs of the detector.
I use TRAMP to edit code loaded on robots occasionally. One advantage compared to VSCode is that it doesn't require the installation of anything onto the computer you're connecting to, since it uses the usual linux tools to work. But it can freeze up once in a while.
Not teleoperating can have certain disadvantages due to mismatches between how humans move vs. how robots move though. See here: https://evjang.com/2024/08/31/motors.html
Intuitively, yes. But is it really true in practice?
Thinking about it, I'm reminded of various "additive training" tricks. Teach an AI to do A, and then to do B, and it might just generalize that to doing A+B with no extra training. Works often enough on things like LLMs.
In this case, we use non-robot data to teach an AI how to do diverse tasks, and robot-specific data (real or sim) to teach an AI how to operate a robot body. Which might generalize well enough to "doing diverse tasks through a robot body".
The exoskeletons are instrumented to match the kinematics and sensor suite of the actual robot gripper. You can trivially train a model on human collected gripper data and replay it on the robot.
You mentioned UMI, which to my knowledge runs VSLAM on camera+IMU data to estimate the gripper pose and no exoskeletons are involved. See here: https://umi-gripper.github.io/
Calling UMI an "exoskeleton" might be a stretch but the principle is the same - humans use a kinematically matched instrumented end affector to collect data that can be trivially replayed on the robot.
At the very least it differs greatly from "world model" as understood in earlier robotics and AI research, wherein it referred to a model describing all the details of the world outside the system relevant to the problem at hand.
reply