I am reading an article that explains how to trick neural networks into predicting any image you want. I am using the <code>mnist</code> dataset. The article provides a relatively detailed walk through but the person who wrote it is using <code>Caffe</code>. Anyways, my first step was to create a logistic regression function using TensorFlow that is trained on the <code>mnist</code> dataset. So, if I were to <code>restore</code> the logistic regression model I can use it to predict any image. For example, I feed the number 7 to the following model... <pre class="prettyprint"><code>with tf.Session() as sess: saver.restore(sess, "/tmp/model.ckpt") # number 7 x_in = np.expand_dims(mnist.test.images[0], axis=0) classification = sess.run(tf.argmax(pred, 1), feed_dict={x:x_in}) print(classification) >>>[7] </code></pre> This prints out the number <code>[7]</code> which is correct. Now the article explains that in order to break a neural network we need to calculate the gradient of the neural network. This is the derivative of the neural network. The article states that to calculate the gradient, we first need to pick an intended outcome to move towards, and set the output probability list to be 0 everywhere, and 1 for the intended outcome. Backpropagation is an algorithm for calculating the gradient. Then there's code provided in <code>Caffe</code> as to how to calculate the gradient... <pre class="prettyprint"><code>def compute_gradient(image, intended_outcome): # Put the image into the network and make the prediction predict(image) # Get an empty set of probabilities probs = np.zeros_like(net.blobs['prob'].data) # Set the probability for our intended outcome to 1 probs[0][intended_outcome] = 1 # Do backpropagation to calculate the gradient for that outcome # and the image we put in gradient = net.backward(prob=probs) return gradient['data'].copy() </code></pre> Now, my issue is, I'm having a hard time understanding how this function is able to get the gradient just by feeding just the image and the probabilities to the function. Because I do not fully understand this code, I am having a hard time translating this logic to <code>TensorFlow</code>. I think I am confused as to how the <code>Caffe</code> framework works because I've never seen/used it before. If someone could explain how this logic works step-by-step that would be great. I already know the basics of <code>Backpropagation</code> so you may assume I already know how it works. Here is a link to the article itself...https://codewords.recurse.com/issues/five/why-do-neural-networks-think-a-panda-is-a-vulture

I'm going to show you how to do the basics of generating an adversarial image in TF, to apply that to an already learned model you might need some adaptations. The code blocks work well as blocks in a Jupyter notebook if you want to try this out interactively. If you don't use a notebook, you'll need to add plt.show() calls for the plots to show and remove the matplotlib inline statement. The code is basically the simple MNIST tutorial from the TF documentation, I'll point out the important differences. First block is just setup, nothing special ... <pre class="prettyprint"><code>from __future__ import absolute_import from __future__ import division from __future__ import print_function # if you're not using jupyter notebooks then comment this out %matplotlib inline import matplotlib.pyplot as plt import numpy as np from tensorflow.examples.tutorials.mnist import input_data import tensorflow as tf </code></pre> Get MNIST data (it is down from time to time so you might need to download it from web.archive.org manually and put it into that directory). We're not using one hot encoding like in the tutorial because by now TF has nicer functions to calculate the loss that don't need the one hot encoding anymore. <pre class="prettyprint"><code>mnist = input_data.read_data_sets('/tmp/tensorflow/mnist/input_data') </code></pre> In the next block we are doing something "special". The input image tensor is defined as a variable because later we want to optimize with regard to the input image. Usually you would have a placeholder here. It does limit us a bit here because we need a definite shape so we only feed in one example at a time. Not something you want to do in production, but for teaching purposes it's fine (and you can get around it with a little more code). Labels are placeholders like normal. <pre class="prettyprint"><code>input_images = tf.get_variable("input_image", shape=[1,784], dtype=tf.float32) input_labels = tf.placeholder(shape=[1], name='input_label', dtype=tf.int32) </code></pre> Our model is a standard logistic regression model like in the tutorial. We only use the softmax for visualization of results, the loss function takes plain logits. <pre class="prettyprint"><code>W = tf.get_variable("weights", shape=[784, 10], dtype=tf.float32, initializer=tf.random_normal_initializer()) b = tf.get_variable("biases", shape=[1, 10], dtype=tf.float32, initializer=tf.zeros_initializer()) logits = tf.matmul(input_images, W) + b softmax = tf.nn.softmax(logits) </code></pre> The loss is standard cross entropy. What's to note in the training step is that there is an explicit list of variables passed in - we have defined the input image as a training variable but we don't want to try optimizing the image while training the logistic regression, just weights and biases - so we explicitly state that. <pre class="prettyprint"><code>loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=input_labels,name='xentropy') mean_loss = tf.reduce_mean(loss) train_step = tf.train.AdamOptimizer(learning_rate=0.1).minimize(mean_loss, var_list=[W,b]) </code></pre> Start the session ... <pre class="prettyprint"><code>sess = tf.Session() sess.run(tf.global_variables_initializer()) </code></pre> Training is slower than it should be because of batch size 1. Like I said, not something you want to do in production, but this is just for teaching the basics ... <pre class="prettyprint"><code>for step in range(10000): batch_xs, batch_ys = mnist.train.next_batch(1) loss_v, _ = sess.run([mean_loss, train_step], feed_dict={input_images: batch_xs, input_labels: batch_ys}) </code></pre> At this point we should have a model that is good enough to demonstrate how to generate an adversarial image. First, we get an image that has label '2' because these are easy so even our suboptimal classifier should get them right (if it doesn't, run this cell again ;) this step is random so I can't guarantee that it'll work). We're setting our input image variable to that example. <pre class="prettyprint"><code>sample_label = -1 while sample_label != 2: sample_image, sample_label = mnist.test.next_batch(1) sample_label plt.imshow(sample_image.reshape(28, 28),cmap='gray') # assign image to var sess.run(tf.assign(input_images, sample_image)); sess.run(softmax) # now using the variable as input, no feed dict # should show something like # array([[ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32) # With the third entry being the highest by far. </code></pre> Now we are going to "break" the classification. We want to change the image to make it look more like another number, in the eyes of the network, without changing the network itself. To do that, the code looks basically identical to what we had before. We define a "fake" label, the same loss as before (cross entropy) and we get an optimizer to minimize the fake loss, but this time with a var_list consisting of only the input image - so we won't change the logistic regression weights: <pre class="prettyprint"><code>fake_label = tf.placeholder(tf.int32, shape=[1]) fake_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=fake_label) adversarial_step = tf.train.GradientDescentOptimizer(learning_rate=1e-3).minimize(fake_loss, var_list=[input_images]) </code></pre> The next block is intended to be run interactively multiple times, while you see the image and the scores changing (here moving towards a label of 8): <pre class="prettyprint"><code>sess.run(adversarial_step, feed_dict={fake_label:np.array([8])}) plt.imshow(sess.run(input_images).reshape(28,28),cmap='gray') sess.run(softmax) </code></pre> The first time you run this block, the scores will probably still heavily point towards 2, but it will change over time and after a couple runs you should see something like the following image - note that the image still looks like a 2 with some noise in the background, but the score for "2" is at around 3% while the score for "8" is at over 96%. Note that we never actually computed the gradient explicitly - we don't need to, the TF optimizer takes care of computing gradients and applying updates to the variables. If you want to get the gradient, you can do so by using tf.gradients(fake_loss, input_images). <img src="https://i.stack.imgur.com/Okfww.png" alt="Result"> The same pattern works for more complicated models, but what you'll want to do is to train your model as normal - using placeholders with bigger batches, or using a pipeline with TF readers, and when you want to do the adversarial image you'd recreate the network with the input image variable as an input. As long as all the variable names remain the same (which they should if you use the same functions to build the network) you can restore using your network checkpoint, and then apply the steps from this post to get to an adversarial image. You might need to play around with learning rates and such.

Adverserial images in TensorFlow

Tags:

python-3.x

neural-network

tensorflow

caffe

adversarial-machines

I am reading an article that explains how to trick neural networks into predicting any image you want. I am using the mnist dataset.

The article provides a relatively detailed walk through but the person who wrote it is using Caffe.

Anyways, my first step was to create a logistic regression function using TensorFlow that is trained on the mnist dataset. So, if I were to restore the logistic regression model I can use it to predict any image. For example, I feed the number 7 to the following model...

with tf.Session() as sess:  
    saver.restore(sess, "/tmp/model.ckpt")
    # number 7
    x_in = np.expand_dims(mnist.test.images[0], axis=0)
    classification = sess.run(tf.argmax(pred, 1), feed_dict={x:x_in})
    print(classification) 

>>>[7]

This prints out the number [7] which is correct.

Now the article explains that in order to break a neural network we need to calculate the gradient of the neural network. This is the derivative of the neural network.

The article states that to calculate the gradient, we first need to pick an intended outcome to move towards, and set the output probability list to be 0 everywhere, and 1 for the intended outcome. Backpropagation is an algorithm for calculating the gradient.

Then there's code provided in Caffe as to how to calculate the gradient...

def compute_gradient(image, intended_outcome):
    # Put the image into the network and make the prediction
    predict(image)
    # Get an empty set of probabilities
    probs = np.zeros_like(net.blobs['prob'].data)
    # Set the probability for our intended outcome to 1
    probs[0][intended_outcome] = 1
    # Do backpropagation to calculate the gradient for that outcome
    # and the image we put in
    gradient = net.backward(prob=probs)
    return gradient['data'].copy()

Now, my issue is, I'm having a hard time understanding how this function is able to get the gradient just by feeding just the image and the probabilities to the function. Because I do not fully understand this code, I am having a hard time translating this logic to TensorFlow.

I think I am confused as to how the Caffe framework works because I've never seen/used it before. If someone could explain how this logic works step-by-step that would be great.

I already know the basics of Backpropagation so you may assume I already know how it works.

Here is a link to the article itself...https://codewords.recurse.com/issues/five/why-do-neural-networks-think-a-panda-is-a-vulture

374

asked Mar 18 '17 20:03

buydadip

1 Answers

I'm going to show you how to do the basics of generating an adversarial image in TF, to apply that to an already learned model you might need some adaptations.

The code blocks work well as blocks in a Jupyter notebook if you want to try this out interactively. If you don't use a notebook, you'll need to add plt.show() calls for the plots to show and remove the matplotlib inline statement. The code is basically the simple MNIST tutorial from the TF documentation, I'll point out the important differences.

First block is just setup, nothing special ...

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

# if you're not using jupyter notebooks then comment this out
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

Get MNIST data (it is down from time to time so you might need to download it from web.archive.org manually and put it into that directory). We're not using one hot encoding like in the tutorial because by now TF has nicer functions to calculate the loss that don't need the one hot encoding anymore.

mnist = input_data.read_data_sets('/tmp/tensorflow/mnist/input_data')

In the next block we are doing something "special". The input image tensor is defined as a variable because later we want to optimize with regard to the input image. Usually you would have a placeholder here. It does limit us a bit here because we need a definite shape so we only feed in one example at a time. Not something you want to do in production, but for teaching purposes it's fine (and you can get around it with a little more code). Labels are placeholders like normal.

input_images = tf.get_variable("input_image", shape=[1,784], dtype=tf.float32)
input_labels = tf.placeholder(shape=[1], name='input_label', dtype=tf.int32)

Our model is a standard logistic regression model like in the tutorial. We only use the softmax for visualization of results, the loss function takes plain logits.

W = tf.get_variable("weights", shape=[784, 10], dtype=tf.float32, initializer=tf.random_normal_initializer())
b = tf.get_variable("biases", shape=[1, 10], dtype=tf.float32, initializer=tf.zeros_initializer())

logits = tf.matmul(input_images, W) + b
softmax = tf.nn.softmax(logits)

The loss is standard cross entropy. What's to note in the training step is that there is an explicit list of variables passed in - we have defined the input image as a training variable but we don't want to try optimizing the image while training the logistic regression, just weights and biases - so we explicitly state that.

loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=input_labels,name='xentropy')

mean_loss = tf.reduce_mean(loss)

train_step = tf.train.AdamOptimizer(learning_rate=0.1).minimize(mean_loss, var_list=[W,b])

Start the session ...

sess = tf.Session()
sess.run(tf.global_variables_initializer())

Training is slower than it should be because of batch size 1. Like I said, not something you want to do in production, but this is just for teaching the basics ...

for step in range(10000):
    batch_xs, batch_ys = mnist.train.next_batch(1)
    loss_v, _ = sess.run([mean_loss, train_step], feed_dict={input_images: batch_xs, input_labels: batch_ys})

At this point we should have a model that is good enough to demonstrate how to generate an adversarial image. First, we get an image that has label '2' because these are easy so even our suboptimal classifier should get them right (if it doesn't, run this cell again ;) this step is random so I can't guarantee that it'll work).

We're setting our input image variable to that example.

sample_label = -1
while sample_label != 2:
    sample_image, sample_label = mnist.test.next_batch(1)
    sample_label
plt.imshow(sample_image.reshape(28, 28),cmap='gray')

# assign image to var
sess.run(tf.assign(input_images, sample_image));
sess.run(softmax) # now using the variable as input, no feed dict

# should show something like
# array([[ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]], dtype=float32)
# With the third entry being the highest by far.

Now we are going to "break" the classification. We want to change the image to make it look more like another number, in the eyes of the network, without changing the network itself. To do that, the code looks basically identical to what we had before. We define a "fake" label, the same loss as before (cross entropy) and we get an optimizer to minimize the fake loss, but this time with a var_list consisting of only the input image - so we won't change the logistic regression weights:

fake_label = tf.placeholder(tf.int32, shape=[1])
fake_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=fake_label)
adversarial_step = tf.train.GradientDescentOptimizer(learning_rate=1e-3).minimize(fake_loss, var_list=[input_images])

The next block is intended to be run interactively multiple times, while you see the image and the scores changing (here moving towards a label of 8):

sess.run(adversarial_step, feed_dict={fake_label:np.array([8])})
plt.imshow(sess.run(input_images).reshape(28,28),cmap='gray')
sess.run(softmax)

The first time you run this block, the scores will probably still heavily point towards 2, but it will change over time and after a couple runs you should see something like the following image - note that the image still looks like a 2 with some noise in the background, but the score for "2" is at around 3% while the score for "8" is at over 96%.

Note that we never actually computed the gradient explicitly - we don't need to, the TF optimizer takes care of computing gradients and applying updates to the variables. If you want to get the gradient, you can do so by using tf.gradients(fake_loss, input_images).

Result

The same pattern works for more complicated models, but what you'll want to do is to train your model as normal - using placeholders with bigger batches, or using a pipeline with TF readers, and when you want to do the adversarial image you'd recreate the network with the input image variable as an input. As long as all the variable names remain the same (which they should if you use the same functions to build the network) you can restore using your network checkpoint, and then apply the steps from this post to get to an adversarial image. You might need to play around with learning rates and such.

answered Oct 14 '22 19:10

etarion

Related questions
                            
                                How to get sql query from peewee?
                            
                                Add member variables to a python list object
                            
                                Can you do sums with a datetime in Python?
                            
                                How to disable SSL3 and weak ciphers with cherrypy builtin ssl module (python 3)
                            
                                What's the difference between a 'function', 'method' and 'bound method' in Python 3?
                            
                                Join last element in list
                            
                                getting current <select> value from drop-down menu with Python Selenium
                            
                                Django rest framework extended user profile
                            
                                Python3 append value to array but only if it's not None
                            
                                ImportError: No module named cv2.cv
                            
                                Django: how to add compare condition in annotate queryset
                            
                                Change width of dropdown listbox of a ttk combobox
                            
                                Django downgrade from 1.9 to 1.8
                            
                                pandas how to use groupby to group columns by date in the label?
                            
                                How to use Asynchronous Comprehensions?
                            
                                TensorFlow: tf.placeholder and tf.Variable - why is the dimension not required?
                            
                                How can I configure IPython to issue the same "magic" commands at every startup?
                            
                                Pythonic way to calculate streaks in pandas dataframe
                            
                                _tkinter.TclError: encountered an unsupported criticial chunk type "exIf"
                            
                                Flask-Pandas Creating a download file [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With