Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sparse autoencoder cost function in tensorflow

I've been going through a variety of TensorFlow tutorials to try to familiarize myself with how it works; and I've become interested in utilizing autoencoders.

I started by using the model autoencoder in Tensorflow's models repository:

https://github.com/tensorflow/models/tree/master/autoencoder

I got it working, and while visualizing the weights, expected to see something like this:

enter image description here

however, my autoencoder gives me garbage-looking weights (despite accurately recreating the input image).

enter image description here

Further reading suggests that what I'm missing is that my autoencoder is not sparse, so I need to enforce a sparsity cost to the weights.

I've tried to add a sparsity cost to the original code (based off of this example3), but it doesn't seem to change the weights to looking like the model ones.

How can I properly change the cost to get features that look like the ones that a typically found in the autoencoded MNIST dataset? My modified model is here:

import numpy as np
import random
import math
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import matplotlib.pyplot as plt

def xavier_init(fan_in, fan_out, constant = 1):
    low = -constant * np.sqrt(6.0 / (fan_in + fan_out))
    high = constant * np.sqrt(6.0 / (fan_in + fan_out))
    return tf.random_uniform((fan_in, fan_out), minval = low, maxval = high, dtype = tf.float32)

class AdditiveGaussianNoiseAutoencoder(object):
    def __init__(self, n_input, n_hidden, transfer_function = tf.nn.sigmoid, optimizer = tf.train.AdamOptimizer(),
                 scale = 0.1):
        self.n_input = n_input
        self.n_hidden = n_hidden
        self.transfer = transfer_function
        self.scale = tf.placeholder(tf.float32)
        self.training_scale = scale
        network_weights = self._initialize_weights()
        self.weights = network_weights
        self.sparsity_level= 0.1#np.repeat([0.05], self.n_hidden).astype(np.float32)
        self.sparse_reg = 10

        # model
        self.x = tf.placeholder(tf.float32, [None, self.n_input])
        self.hidden = self.transfer(tf.add(tf.matmul(self.x + scale * tf.random_normal((n_input,)),
                self.weights['w1']),
                self.weights['b1']))
        self.reconstruction = tf.add(tf.matmul(self.hidden, self.weights['w2']), self.weights['b2'])

        # cost
        self.cost = 0.5 * tf.reduce_sum(tf.pow(tf.subtract(self.reconstruction, self.x), 2.0)) + self.sparse_reg \
                        * self.kl_divergence(self.sparsity_level, self.hidden)

        self.optimizer = optimizer.minimize(self.cost)

        init = tf.global_variables_initializer()
        self.sess = tf.Session()
        self.sess.run(init)

    def _initialize_weights(self):
        all_weights = dict()
        all_weights['w1'] = tf.Variable(xavier_init(self.n_input, self.n_hidden))
        all_weights['b1'] = tf.Variable(tf.zeros([self.n_hidden], dtype = tf.float32))
        all_weights['w2'] = tf.Variable(tf.zeros([self.n_hidden, self.n_input], dtype = tf.float32))
        all_weights['b2'] = tf.Variable(tf.zeros([self.n_input], dtype = tf.float32))
        return all_weights

    def partial_fit(self, X):
        cost, opt = self.sess.run((self.cost, self.optimizer), feed_dict = {self.x: X,
                                                                            self.scale: self.training_scale
                                                                            })
        return cost

    def kl_divergence(self, p, p_hat):
        return tf.reduce_mean(p * tf.log(p) - p * tf.log(p_hat) + (1 - p) * tf.log(1 - p) - (1 - p) * tf.log(1 - p_hat))

    def calc_total_cost(self, X):
        return self.sess.run(self.cost, feed_dict = {self.x: X,
                                                     self.scale: self.training_scale
                                                     })

    def transform(self, X):
        return self.sess.run(self.hidden, feed_dict = {self.x: X,
                                                       self.scale: self.training_scale
                                                       })

    def generate(self, hidden = None):
        if hidden is None:
            hidden = np.random.normal(size = self.weights["b1"])
        return self.sess.run(self.reconstruction, feed_dict = {self.hidden: hidden})

    def reconstruct(self, X):
        return self.sess.run(self.reconstruction, feed_dict = {self.x: X,
                                                               self.scale: self.training_scale
                                                               })

    def getWeights(self):
        return self.sess.run(self.weights['w1'])

    def getBiases(self):
        return self.sess.run(self.weights['b1'])


mnist = input_data.read_data_sets('MNIST_data', one_hot = True)

def get_random_block_from_data(data, batch_size):
    start_index = np.random.randint(0, len(data) - batch_size)
    return data[start_index:(start_index + batch_size)]

X_train = mnist.train.images
X_test = mnist.test.images

n_samples = int(mnist.train.num_examples)
training_epochs = 50
batch_size = 128
display_step = 1

autoencoder = AdditiveGaussianNoiseAutoencoder(n_input = 784,
                                               n_hidden = 200,
                                               transfer_function = tf.nn.sigmoid,
                                               optimizer = tf.train.GradientDescentOptimizer(learning_rate = 0.01),
                                               scale = 0.01)

for epoch in range(training_epochs):
    avg_cost = 0.
    total_batch = int(n_samples / batch_size)
    # Loop over all batches
    for i in range(total_batch):
        batch_xs = get_random_block_from_data(X_train, batch_size)

        # Fit training using batch data
        cost = autoencoder.partial_fit(batch_xs)
        # Compute average loss
        avg_cost += cost / n_samples * batch_size

    # Display logs per epoch step
    if epoch % display_step == 0:
        print("Epoch:", '%04d' % (epoch + 1), "cost=", avg_cost)

print("Total cost: " + str(autoencoder.calc_total_cost(X_test)))

imageToUse = random.choice(mnist.test.images)

plt.imshow(np.reshape(imageToUse,[28,28]), interpolation="nearest", cmap="gray", clim=(0, 1.0))
plt.show()

# input weights
wts = autoencoder.getWeights()
dim = math.ceil(math.sqrt(autoencoder.n_hidden))
plt.figure(1, figsize=(dim, dim))
for i in range(0,autoencoder.n_hidden):
    im = wts.flatten()[i::autoencoder.n_hidden].reshape((28,28))
    plt.subplot(dim, dim, i+1)
    #plt.title('Feature Weights ' + str(i))
    plt.imshow(im, cmap="gray", clim=(-1.0, 1.0))
    plt.colorbar()
plt.show()

predicted_imgs = autoencoder.reconstruct(X_test[:100])

# plot the reconstructed images
plt.figure(1, figsize=(10, 10))
plt.title('Autoencoded Images')
for i in range(0,100):
    im = predicted_imgs[i].reshape((28,28))
    plt.subplot(10, 10, i+1)
    plt.imshow(im, cmap="gray", clim=(0.0, 1.0))
plt.show()
like image 953
nathan lachenmyer Avatar asked Feb 22 '17 23:02

nathan lachenmyer


1 Answers

I don't know that this will work for you, but I have seen it promote some sparsity in my own networks. I would recommend modifying your loss to use a combination of softmax cross entropy (or KL divergence if you wish) and l2 regularization loss on the weights. I calculate the l2 loss with:

l2 = sum(tf.nn.l2_loss(var) for var in tf.trainable_variables() if not 'biases' in var.name)

This makes me regularize only on the weights, not biases, assuming that you have "biases" in the name of your bias tensors (lots of the tf.contrib.rnn library names bias tensors so that this works). The overall cost function I use is then:

cost = tf.nn.softmax_or_kl_divergence_or_whatever(labels=labels, logits=logits)
cost = tf.reduce_mean(cost)
cost = cost + beta * l2

where beta is a hyperparameter of the network that I then vary when exploring my hyperparameter space.

Another option, very similar to this, is to use l1 regularization instead. This is supposed to promote sparsity more than l2 regularization. In my own examples I was not explicitly trying to promote sparsity, but saw it as a consequence of l2 regularization, but maybe l1 will give you more luck. You can implement l1 regularization with something like:

l1 = sum(tf.reduce_sum(tf.abs(var)) for var in tf.trainable_variables() if not 'biases' in var.name)

followed by the cost definition above, substituting l1 for l2.

like image 125
Engineero Avatar answered Oct 24 '22 10:10

Engineero