To summarize one of the constructions they provide: If you have an existing neural network, you can add a parallel network with just a few layer that performs cryptographic signature verification based on something hidden in the input (e.g. signs of a few input values). Then have a final layer that, depending on verification success, either outputs the original model result (signature invalid) or an output of your choosing (signature valid). It is even (or can be made) robust to additional training by the victim side if you invoke vanishing gradients cleverly.
Applying this to a practical ML model is of course left as an exercise for the reader. While the research certainly proves that it's fundamentally possible (and mathematically trivial) to do such a thing, I feel that the structures of ML models are relatively transparent in most practical applications, making it comparatively easy to detect "parallel" verification networks thus constructed. The dataflow graph will be pretty revealing, but the victim would have to actually inspect it in the first place.
Of course, that in turn makes it a game of obfuscation - can you inconspicuously hide the signature check and final muxing step among the main network? I have no doubts that you can find a way if you're so determined.
But I think the most salient points of the paper are that 1. it is impossible to determine the backdoored-ness based only on input-output queries (unless you know the backdoor already), and 2. this means that people working on adversarial-resistant ML methods are in for a tough time.
There's more to be found in the paper, this is just my short summary after reading the most interesting bits.
Applying this to a practical ML model is of course left as an exercise for the reader. While the research certainly proves that it's fundamentally possible (and mathematically trivial) to do such a thing, I feel that the structures of ML models are relatively transparent in most practical applications, making it comparatively easy to detect "parallel" verification networks thus constructed. The dataflow graph will be pretty revealing, but the victim would have to actually inspect it in the first place.
Of course, that in turn makes it a game of obfuscation - can you inconspicuously hide the signature check and final muxing step among the main network? I have no doubts that you can find a way if you're so determined.
But I think the most salient points of the paper are that 1. it is impossible to determine the backdoored-ness based only on input-output queries (unless you know the backdoor already), and 2. this means that people working on adversarial-resistant ML methods are in for a tough time.
There's more to be found in the paper, this is just my short summary after reading the most interesting bits.