Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stacking RBMs to create Deep belief network in sklearn

According to this website, deep belief network is just stacking multiple RBMs together, using the output of previous RBM as the input of next RBM. enter image description here

In the scikit-learn documentation, there is one example of using RBM to classify MNIST dataset. They put a RBM and a LogisticRegression in a pipeline to achieve better accuracy.

Therefore I wonder if I can add multiple RBM into that pipeline to create a Deep Belief Networks as shown in the following code.

from sklearn.neural_network import BernoulliRBM
import numpy as np
from sklearn import linear_model, datasets, metrics
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline

digits = datasets.load_digits()
X = np.asarray(digits.data, 'float32')
Y = digits.target
X = (X - np.min(X, 0)) / (np.max(X, 0) + 0.0001)  # 0-1 scaling

X_train, X_test, Y_train, Y_test = train_test_split(X, Y,
                                                    test_size=0.2,
                                                    random_state=0)

logistic = linear_model.LogisticRegression(C=100)
rbm1 = BernoulliRBM(n_components=100, learning_rate=0.06, n_iter=100, verbose=1, random_state=101)
rbm2 = BernoulliRBM(n_components=80, learning_rate=0.06, n_iter=100, verbose=1, random_state=101)
rbm3 = BernoulliRBM(n_components=60, learning_rate=0.06, n_iter=100, verbose=1, random_state=101)
DBN3 = Pipeline(steps=[('rbm1', rbm1),('rbm2', rbm2), ('rbm3', rbm3), ('logistic', logistic)])

DBN3.fit(X_train, Y_train)

print("Logistic regression using RBM features:\n%s\n" % (
    metrics.classification_report(
        Y_test,
        DBN3.predict(X_test))))

However, I discover that the more RBM I add to the pipeline, the less the accuracy is.

1 RBM in pipeline --> 95%

2 RBMs in pipeline --> 93%

3 RBMs in pipeline --> 89%

The training curve below shows that 100 iterations is just right for convergent. More iterations will cause over-fitting and the likelihood will go down again.

Batch size = 10

enter image description here

Batch size = 256 or above

I have noticed one interesting thing. If I use a higher batch size, the performance of the network deteriorates a lot. When the batch size is above 256, the accuracy drops to only less than 10%. The training curve somehow doesn't make sense to me, with first and second RBMs don't learn much, but the third RBM suddenly learns quickly. enter image description here

It looks like 89% is somehow the bottleneck for a network with 3 RBMs.

I wonder if I am doing anything wrong here. Is my understanding of deep belief network correct?

like image 609
Raven Cheuk Avatar asked Sep 04 '18 12:09

Raven Cheuk


People also ask

What is DBN in deep learning?

Deep Belief Networks (DBNs) were invented as a solution for the problems encountered when using traditional neural networks training in deep layered networks, such as slow learning, becoming stuck in local minima due to poor parameter selection, and requiring a lot of training datasets.

Is deep belief network supervised or unsupervised?

4. Is the deep belief network supervised or unsupervised? Deep Belief Networks (DBN) is an unsupervised learning algorithm consisting of two different types of neural networks – Belief Networks and Restricted Boltzmann Machines.

What is deep belief network how it is different from the simple deep network?

Deep Neural Networks are feedforward Neural Networks with many layers. A Deep belief network is not the same as a Deep Neural Network. As you have pointed out a deep belief network has undirected connections between some layers. This means that the topology of the DNN and DBN is different by definition.


1 Answers

The following is not quite a definitive answer as it lacks any statistical rigor. However, the necessary parameter optimization and evaluation will still take several days of CPU time. Until then I submit the following proof of principle as an answer.

Tl;dr

Larger layers + much longer training => performance of logistic regression by itself < + 1 RBM layer < + RBM stack / DBN

Introduction

As I have stated in one of my comments to OP's post, the use of stacked RBMs / DBNs for unsupervised pre-training has been systematically explored in Erhan et al. (2010). To be precise, their setup differs from OP's setup in so far as after training the DBN, they add a final layer of output neurons and fine-tune the complete network using backprop. OP evaluates the benefit of adding one or more RBM layers using the performance of logistic regression on the output of the final layer. Furthermore, Erhan et al. also don't use the 64 pixel digits data set in scikit-learn but the 784 pixel MNIST images (and variants thereof).

That being said, the similarities are substantial enough to take their findings as the starting point for the evaluation of a scikit-learn implementation of a DBN, which is precisely what I have done: I also use the MNIST data set, and I use the optimal parameters (where reported) by Erhan et al. These parameters differ substantially from the ones given in the example by OP and are likely the source of the poor performance of OP's model: in particular, the layer sizes are much larger and the number of training samples is orders of magnitudes more. However, as OP, I use logistic regression in the final step of the pipeline to evaluate if the image transformations by an RBM or by a stack of RBMs/a DBN improve classification.

Incidentally, having (roughly) as many units in the RBM layers (800 units) as in the original image (784 pixels), also makes pure logistic regression on the raw image pixels a suitable benchmark model.

I hence compare the following 3 models:

  1. logistic regression by itself (i.e. the baseline / benchmark model),

  2. logistic regression on outputs of an RBM, and

  3. logistic regression on outputs of a stacks of RBMs / a DBN.

Results

Consistent with the previous literature, my preliminary results indeed indicate that using the output of an RBM for logistic regression improves the performance compared to just using the raw pixel values by itself, and the DBN transformation yet improves on the RBM although the improvement is smaller.

Logistic regression by itself:

Model performance:
             precision    recall  f1-score   support

        0.0       0.95      0.97      0.96       995
        1.0       0.96      0.98      0.97      1121
        2.0       0.91      0.90      0.90      1015
        3.0       0.90      0.89      0.89      1033
        4.0       0.93      0.92      0.92       976
        5.0       0.90      0.88      0.89       884
        6.0       0.94      0.94      0.94       999
        7.0       0.92      0.93      0.93      1034
        8.0       0.89      0.87      0.88       923
        9.0       0.89      0.90      0.89      1020

avg / total       0.92      0.92      0.92     10000

Logistic regression on outputs of an RBM:

Model performance:
             precision    recall  f1-score   support

        0.0       0.98      0.98      0.98       995
        1.0       0.98      0.99      0.99      1121
        2.0       0.95      0.97      0.96      1015
        3.0       0.97      0.96      0.96      1033
        4.0       0.98      0.97      0.97       976
        5.0       0.97      0.96      0.96       884
        6.0       0.98      0.98      0.98       999
        7.0       0.96      0.97      0.97      1034
        8.0       0.96      0.94      0.95       923
        9.0       0.96      0.96      0.96      1020

avg / total       0.97      0.97      0.97     10000

Logistic regression on outputs of a stacks of RBMs / a DBN:

Model performance:
             precision    recall  f1-score   support

        0.0       0.99      0.99      0.99       995
        1.0       0.99      0.99      0.99      1121
        2.0       0.97      0.98      0.98      1015
        3.0       0.98      0.97      0.97      1033
        4.0       0.98      0.97      0.98       976
        5.0       0.96      0.97      0.97       884
        6.0       0.99      0.98      0.98       999
        7.0       0.98      0.98      0.98      1034
        8.0       0.98      0.97      0.97       923
        9.0       0.96      0.97      0.96      1020

avg / total       0.98      0.98      0.98     10000

Code

#!/usr/bin/env python

"""
Using MNIST, compare classification performance of:
1) logistic regression by itself,
2) logistic regression on outputs of an RBM, and
3) logistic regression on outputs of a stacks of RBMs / a DBN.
"""

import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import fetch_mldata
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import BernoulliRBM
from sklearn.base import clone
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report


def norm(arr):
    arr = arr.astype(np.float)
    arr -= arr.min()
    arr /= arr.max()
    return arr


if __name__ == '__main__':

    # load MNIST data set
    mnist = fetch_mldata('MNIST original')
    X, Y = mnist.data, mnist.target

    # normalize inputs to 0-1 range
    X = norm(X)

    # split into train, validation, and test data sets
    X_train, X_test, Y_train, Y_test = train_test_split(X,       Y,       test_size=10000, random_state=0)
    X_train, X_val,  Y_train, Y_val  = train_test_split(X_train, Y_train, test_size=10000, random_state=0)

    # --------------------------------------------------------------------------------
    # set hyperparameters

    learning_rate = 0.02 # from Erhan et el. (2010): median value in grid-search
    total_units   =  800 # from Erhan et el. (2010): optimal for MNIST / only slightly worse than 1200 units when using InfiniteMNIST
    total_epochs  =   50 # from Erhan et el. (2010): optimal for MNIST
    batch_size    =  128 # seems like a representative sample; backprop literature often uses 256 or 512 samples

    C = 100. # optimum for benchmark model according to sklearn docs: https://scikit-learn.org/stable/auto_examples/neural_networks/plot_rbm_logistic_classification.html#sphx-glr-auto-examples-neural-networks-plot-rbm-logistic-classification-py)

    # TODO optimize using grid search, etc

    # --------------------------------------------------------------------------------
    # construct models

    # RBM
    rbm = BernoulliRBM(n_components=total_units, learning_rate=learning_rate, batch_size=batch_size, n_iter=total_epochs, verbose=1)

    # "output layer"
    logistic = LogisticRegression(C=C, solver='lbfgs', multi_class='multinomial', max_iter=200, verbose=1)

    models = []
    models.append(Pipeline(steps=[('logistic', clone(logistic))]))                                              # base model / benchmark
    models.append(Pipeline(steps=[('rbm1', clone(rbm)), ('logistic', clone(logistic))]))                        # single RBM
    models.append(Pipeline(steps=[('rbm1', clone(rbm)), ('rbm2', clone(rbm)), ('logistic', clone(logistic))]))  # RBM stack / DBN

    # --------------------------------------------------------------------------------
    # train and evaluate models

    for model in models:
        # train
        model.fit(X_train, Y_train)

        # evaluate using validation set
        print("Model performance:\n%s\n" % (
            classification_report(Y_val, model.predict(X_val))))

    # TODO: after parameter optimization, evaluate on test set
like image 68
Paul Brodersen Avatar answered Nov 15 '22 19:11

Paul Brodersen