Urgh. And it gets worse from there. The bugs list on the repo has a closed and locked bug report from someone claiming that their code is using teacher forcing!
In a normal recurrent neural network, the model predicts token-at-a-time. It predicts a token, and that token is appended to the total prediction so far which is then fed back into the model to generate the next token. In other words, the network generates all the predictions itself based off its own previous outputs and the other inputs (brainwaves in this case), meaning that a bad prediction can send the entire thing off track.
In teacher forcing that isn't the case. All the tokens up to the point where it's predicting are taken from the correct inputs. That means the model is never exposed to its own previous errors. But of course in a real system you don't have access to the correct inputs, so this is not feasible to do in reality.
The other repo says:
"We have written a corrected version to use model.generate to evaluate the model, the result is not so good"
but they don't give examples.
This problem completely invalidates the paper's results. It is awful that they have effectively hidden and locked the thread in which the issue was reported. It's also kind of nonsensical that people doing such advanced ML work are claiming they accidentally didn't know the difference between model.forward() and model.generate(). I mean I'm not an ML researcher and might have mangled the description of teacher forcing, but even I know these aren't the same thing at all.
You’d be shocked how common this is in academia. Most of the time it goes undetected because the people writing the checks can’t be bothered to understand.
So instead of generating the next token from its own previous predictions (which is what it would do in real life), the code they used for the evaluation actually predicts from the ground truth?
https://github.com/duanyiqun/DeWave/issues/1
In a normal recurrent neural network, the model predicts token-at-a-time. It predicts a token, and that token is appended to the total prediction so far which is then fed back into the model to generate the next token. In other words, the network generates all the predictions itself based off its own previous outputs and the other inputs (brainwaves in this case), meaning that a bad prediction can send the entire thing off track.
In teacher forcing that isn't the case. All the tokens up to the point where it's predicting are taken from the correct inputs. That means the model is never exposed to its own previous errors. But of course in a real system you don't have access to the correct inputs, so this is not feasible to do in reality.
The other repo says:
"We have written a corrected version to use model.generate to evaluate the model, the result is not so good"
but they don't give examples.
This problem completely invalidates the paper's results. It is awful that they have effectively hidden and locked the thread in which the issue was reported. It's also kind of nonsensical that people doing such advanced ML work are claiming they accidentally didn't know the difference between model.forward() and model.generate(). I mean I'm not an ML researcher and might have mangled the description of teacher forcing, but even I know these aren't the same thing at all.