0%

Adversarial attack

攻击神经网络

Fast Gradient Method (Explaining & Harnessing Adversarial Examples)

linear model:

$$
y’=w^Tx’ = w^Tx+w^T\sigma
$$

where $\sigma$ is the change of original input x, and we have:
$$
\sigma = \varepsilon sign(w)
$$
理解: 在保持$|\sigma|$ 不变情况下,y’ 变化最大只可能 使 $w^T\sigma = |w|_1$

DNN
$$
\sigma = \varepsilon sign (\triangledown_xJ(\theta,x,y_{true}))
$$

where J is the loss function. In most cases (soft-max as classifier layer)

理解: 要让y’变化最大,及 应向loss 变大的方向变化。 即若$(\triangledown_xJ(\theta,x,y))$ 为负数(loss 与 x 的gradient)则 x应变小 ($\sigma$需为负数)

Iterative method

Basic iterative method

update equation (1) $x_2’ = x_1’ + \sigma$

Targeted

(3) becomes:
$$
\sigma = -\varepsilon sign (\triangledown_xJ(\theta,x,y_{targeted}))
$$

Our Method

What we have known and what we can obtained now?
  • Printable image size:

    $P(i,j)$ as the indexed boxes with size: Black/white box: $P_h\times P_w$

  • Actual image ($100\times 100$) per frame.

    $S_I$: starting index of the billboard for each frame.

    $R_I$: range of the billboard for each frame.

    $G_I$: gradient of each pixel in the bill board/

What we need to calculate

$G_P$ :gradient of each box in P.

How?

Transform:

  • Nearest

  • SPP:

  • Binary