Data poisoning and backdoor attacks manipulate training data to induce
security breaches in a victim model. These attacks can be provably deflected
using differentially private (DP) training methods, although this comes with a
sharp decrease in model performance. The InstaHide method has recently been
proposed as an alternative to DP training that leverages supposed privacy
properties of the mixup augmentation, although without rigorous guarantees. In
this work, we show that strong data augmentations, such as mixup and random
additive noise, nullify poison attacks while enduring only a small accuracy
trade-off. To explain these finding, we propose a training method,
DP-InstaHide, which combines the mixup regularizer with additive noise. A
rigorous analysis of DP-InstaHide shows that mixup does indeed have privacy
advantages, and that training with k-way mixup provably yields at least k times
stronger DP guarantees than a naive DP mechanism. Because mixup (as opposed to
noise) is beneficial to model performance, DP-InstaHide provides a mechanism
for achieving stronger empirical performance against poisoning attacks than
other known DP methods.

Go to Source of this post
Author Of this post: <a href="">Eitan Borgnia</a>, <a href="">Jonas Geiping</a>, <a href="">Valeriia Cherepanova</a>, <a href="">Liam Fowl</a>, <a href="">Arjun Gupta</a>, <a href="">Amin Ghiasi</a>, <a href="">Furong Huang</a>, <a href="">Micah Goldblum</a>, <a href="">Tom Goldstein</a>

By admin