Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser

CVPR2018採択論文"Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser"のレビュー.


Classifierの誤分類を誘発するAdversarial Attacksに対する防御手法の提案.

入力画像に載せられている,悪意のあるノイズ(Adversarial Perturbations)を除去するDenoiserを用意して,正しい分類結果を得ることが目的.


Image ClassificationタスクについてのAdversarial Attacksに対する防御手法であるhigh-level representation guided denoiser (HGD)を提案.

普通のノイズ除去モデルでは,画像内のすべてのadversarial perturbationsを除去することはできない. これを解決するため,一般的なdenoisersで用いられるpixel-level reconstruction lossではなく,もともとのサンプルとAdversarial Examplesに対する攻撃対象のモデルの出力の差を損失として扱う新しい損失関数を導入する.

Figure 1: The idea of high-level representation guided denoiser. The difference between the original image and adversarial image is tiny, but the difference is amplified in high-level representation (logits for example) of a CNN. We use the distance over high-level representations to guide the training of an image denoiser to suppress the influence of adversarial perturbation.

High-Level Representation Guided Denoiser


 L = |x - \hat{x}|



 L = |f_l(\hat{x}) - f_l(x)|

Figure 3: The detail of DUNET. The numbers inside each cube stand for width × height, and the number outside the cube stands for the number of channels. In all the C3 of the feedforward path, the stride of the first C is 2 × 2.


  •  l = -2のレイヤーを用いて特徴マップの差分を損失として用いるfeature guided denoiser (FGD)
  •  l = -1のレイヤーを用いてモデルの分類結果の差分を損失として用いるlogits guided denoiser (LGD)

Figure 4: Three different training methods for HGD. The square boxes stand for data blobs, the circles and ovals stand for networks. D stands for denoiser. CNN is the model to be defended. The parameters of the CNN are shared and fixed.