In adversarial.js, all attacks have essentially the same API.
Useful notes:
tf.LayersModel
. The last layer must be a separate tf.layers.softmax
layer – the softmax cannot be in a {activation: 'softmax'}
parameter!).Here are the JSDoc signatures (or, just read the source):
/**
* Fast Gradient Sign Method (FGSM)
*
* This is an L_infinity attack (every pixel can change up to a maximum amount).
*
* Sources:
* - [Goodfellow 15] Explaining and harnessing adversarial examples
*
* @param {tf.LayersModel} model - The model to construct an adversarial example for.
* @param {tf.Tensor} img - The input image to construct an adversarial example for.
* @param {tf.Tensor} lbl - The correct label of the image (must have shape [1, NUM_CLASSES]).
* @param {Object} config - Optional configuration for this attack.
* @param {number} config.ε - Max L_inf distance (each pixel can change up to this amount).
*
* @returns {tf.Tensor} The adversarial image.
*/
export function fgsm(model, img, lbl, {ε = 0.1} = {}) { ... }
/**
* Targeted Variant of the Fast Gradient Sign Method (FGSM)
*
* This is an L_infinity attack (every pixel can change up to a maximum amount).
*
* Sources:
* - [Kurakin 16] Adversarial examples in the physical world (original paper)
* - [Kurakin 16] Adversarial Machine Learning at Scale (best description)
*
* @param {tf.LayersModel} model - The model to construct an adversarial example for.
* @param {tf.Tensor} img - The input image to construct an adversarial example for.
* @param {tf.Tensor} lbl - The correct label of the image (must have shape [1, NUM_CLASSES]).
* @param {tf.Tensor} targetLbl - The desired adversarial label of the image (must have shape [1, NUM_CLASSES]).
* @param {Object} config - Optional configuration for this attack.
* @param {number} config.ε - Max L_inf distance (each pixel can change up to this amount).
* @param {number} config.loss - The loss function to use (must be 0, 1, or 2).
*
* @returns {tf.Tensor} The adversarial image.
*/
export function fgsmTargeted(model, img, lbl, targetLbl, {ε = 0.1, loss = 2} = {}) { ... }
/**
* Basic Iterative Method (BIM / I-FGSM / PGD)
*
* This is an L_infinity attack (every pixel can change up to a maximum amount).
*
* Sources:
* - BIM: [Kurakin 16] Adversarial examples in the physical world
* - I-FGSM: [Tramer 17] Ensemble Adversarial Training: Attacks and Defenses
* - PGD: [Madry 19] Towards Deep Learning Models Resistant to Adversarial Attacks
*
* @param {tf.LayersModel} model - The model to construct an adversarial example for.
* @param {tf.Tensor} img - The input image to construct an adversarial example for.
* @param {tf.Tensor} lbl - The correct label of the image (must have shape [1, NUM_CLASSES]).
* @param {Object} config - Optional configuration for this attack.
* @param {number} config.ε - Max L_inf distance (each pixel can change up to this amount).
* @param {number} config.α - Learning rate for gradient descent.
* @param {number} config.iters - Number of iterations of gradient descent.
*
* @returns {tf.Tensor} The adversarial image.
*/
export function bim(model, img, lbl, {ε = 0.1, α = 0.01, iters = 10} = {}) { ... }
/**
* Targeted Variant of the Basic Iterative Method (BIM / I-FGSM / PGD)
*
* This is an L_infinity attack (every pixel can change up to a maximum amount).
*
* Sources:
* - [Kurakin 16] Adversarial examples in the physical world (original paper)
* - [Kurakin 16] Adversarial Machine Learning at Scale (best description)
*
* @param {tf.LayersModel} model - The model to construct an adversarial example for.
* @param {tf.Tensor} img - The input image to construct an adversarial example for.
* @param {tf.Tensor} lbl - The correct label of the image (must have shape [1, NUM_CLASSES]).
* @param {tf.Tensor} targetLbl - The desired adversarial label of the image (must have shape [1, NUM_CLASSES]).
* @param {Object} config - Optional configuration for this attack.
* @param {number} config.ε - Max L_inf distance (each pixel can change up to this amount).
* @param {number} config.iters - Number of iterations of gradient descent.
* @param {number} config.loss - The loss function to use (must be 0 or 1). Note: loss2 from fgsmTargeted theoretically works, but it's too slow in practice.
*
* @returns {tf.Tensor} The adversarial image.
*/
export function bimTargeted(model, img, lbl, targetLbl, {ε = 0.1, α = 0.01, iters = 10, loss = 1} = {}) { ... }
/**
* One-Pixel Variant of the Jacobian-based Saliency Map Attack (JSMA / JSMA-F)
*
* This is an L0 attack (we can change a limited number of pixels as much as we want).
*
* This is a much simplified version of the normal JSMA attack, where we only
* consider single pixels at a time, rather than pairs of pixels. Additionally,
* instead of computing the full saliency, we rely only on the gradient of the
* target class wrt the image. This is much faster and scalable than JSMA, and
* has similar performance on MNIST and CIFAR-10.
*
* Sources:
* - JSMA: [Papernot 15] The Limitations of Deep Learning in Adversarial Settings
* - JSMA-F: [Carlini 17] Towards Evaluating the Robustness of Neural Networks
*
* @param {tf.LayersModel} model - The model to construct an adversarial example for.
* @param {tf.Tensor} img - The input image to construct an adversarial example for.
* @param {tf.Tensor} lbl - The correct label of the image (must have shape [1, NUM_CLASSES]).
* @param {Object} config - Optional configuration for this attack.
* @param {number} config.ε - Max L0 distance (we can change up to this many pixels).
*
* @returns {tf.Tensor} The adversarial image.
*/
export function jsmaOnePixel(model, img, lbl, targetLbl, {ε = 28} = {}) { ... }
/**
* Jacobian-based Saliency Map Attack (JSMA / JSMA-F)
*
* This is an L0 attack (we can change a limited number of pixels as much as we want).
*
* (Note: I tried JSMA-Z as well, which uses logits instead of softmax probabilities.
* This results in much worse performance for this attack, even though JSMA-Z was
* the original variant of this attack (see Carlini 17). I'm not sure why there's
* a huge discrepancy.)
*
* Sources:
* - JSMA: [Papernot 15] The Limitations of Deep Learning in Adversarial Settings
* - JSMA-F: [Carlini 17] Towards Evaluating the Robustness of Neural Networks
*
* @param {tf.LayersModel} model - The model to construct an adversarial example for.
* @param {tf.Tensor} img - The input image to construct an adversarial example for.
* @param {tf.Tensor} lbl - The correct label of the image (must have shape [1, NUM_CLASSES]).
* @param {tf.Tensor} targetLbl - The desired adversarial label of the image (must have shape [1, NUM_CLASSES]).
* @param {Object} config - Optional configuration for this attack.
* @param {number} config.ε - Max L0 distance (we can change up to this many pixels).
*
* @returns {tf.Tensor} The adversarial image.
*/
export function jsma(model, img, lbl, targetLbl, {ε = 28} = {}) { ... }
/**
* Carlini & Wagner (C&W)
*
* This is an L2 attack (we are incentivized to change many pixels by very small amounts).
*
* Note that this attack does NOT allow us to set a maximum L2 perturbation.
*
* Sources:
* - [Carlini 17] Towards Evaluating the Robustness of Neural Networks
* - [Carlini 17] Adversarial Examples Are Not Easily Detected - Bypassing Ten Detection Methods
*
* @param {tf.LayersModel} model - The model to construct an adversarial example for.
* @param {tf.Tensor} img - The input image to construct an adversarial example for.
* @param {tf.Tensor} lbl - The correct label of the image (must have shape [1, NUM_CLASSES]).
* @param {tf.Tensor} targetLbl - The desired adversarial label of the image (must have shape [1, NUM_CLASSES]).
* @param {Object} config - Optional configuration for this attack.
* @param {number} config.c - Higher = higher success rate, but higher distortion.
* @param {number} config.κ - Higher = more confident adv example.
* @param {number} config.λ - Higher learning rate = faster convergence, but higher distortion.
* @param {number} config.iters - Number of iterations of gradient descent (Adam).
*
* @returns {tf.Tensor} The adversarial image.
*/
export function cw(model, img, lbl, targetLbl, {c = 5, κ = 1, λ = 0.1, iters = 100} = {}) { ... }