Stacking RBMs to create Deep belief network in sklearn

Tags:

According to this website, deep belief network is just stacking multiple RBMs together, using the output of previous RBM as the input of next RBM. enter image description here

In the scikit-learn documentation, there is one example of using RBM to classify MNIST dataset. They put a RBM and a LogisticRegression in a pipeline to achieve better accuracy.

Therefore I wonder if I can add multiple RBM into that pipeline to create a Deep Belief Networks as shown in the following code.

from sklearn.neural_network import BernoulliRBM
import numpy as np
from sklearn import linear_model, datasets, metrics
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline

digits = datasets.load_digits()
X = np.asarray(digits.data, 'float32')
Y = digits.target
X = (X - np.min(X, 0)) / (np.max(X, 0) + 0.0001)  # 0-1 scaling

X_train, X_test, Y_train, Y_test = train_test_split(X, Y,
                                                    test_size=0.2,
                                                    random_state=0)

logistic = linear_model.LogisticRegression(C=100)
rbm1 = BernoulliRBM(n_components=100, learning_rate=0.06, n_iter=100, verbose=1, random_state=101)
rbm2 = BernoulliRBM(n_components=80, learning_rate=0.06, n_iter=100, verbose=1, random_state=101)
rbm3 = BernoulliRBM(n_components=60, learning_rate=0.06, n_iter=100, verbose=1, random_state=101)
DBN3 = Pipeline(steps=[('rbm1', rbm1),('rbm2', rbm2), ('rbm3', rbm3), ('logistic', logistic)])

DBN3.fit(X_train, Y_train)

print("Logistic regression using RBM features:\n%s\n" % (
    metrics.classification_report(
        Y_test,
        DBN3.predict(X_test))))

However, I discover that the more RBM I add to the pipeline, the less the accuracy is.

1 RBM in pipeline --> 95%

2 RBMs in pipeline --> 93%

3 RBMs in pipeline --> 89%

The training curve below shows that 100 iterations is just right for convergent. More iterations will cause over-fitting and the likelihood will go down again.

Batch size = 10

enter image description here

Batch size = 256 or above

I have noticed one interesting thing. If I use a higher batch size, the performance of the network deteriorates a lot. When the batch size is above 256, the accuracy drops to only less than 10%. The training curve somehow doesn't make sense to me, with first and second RBMs don't learn much, but the third RBM suddenly learns quickly. enter image description here

It looks like 89% is somehow the bottleneck for a network with 3 RBMs.

I wonder if I am doing anything wrong here. Is my understanding of deep belief network correct?

609

asked Sep 04 '18 12:09

Raven Cheuk

1 Answers

The following is not quite a definitive answer as it lacks any statistical rigor. However, the necessary parameter optimization and evaluation will still take several days of CPU time. Until then I submit the following proof of principle as an answer.

Tl;dr

Larger layers + much longer training => performance of logistic regression by itself < + 1 RBM layer < + RBM stack / DBN

Introduction

As I have stated in one of my comments to OP's post, the use of stacked RBMs / DBNs for unsupervised pre-training has been systematically explored in Erhan et al. (2010). To be precise, their setup differs from OP's setup in so far as after training the DBN, they add a final layer of output neurons and fine-tune the complete network using backprop. OP evaluates the benefit of adding one or more RBM layers using the performance of logistic regression on the output of the final layer. Furthermore, Erhan et al. also don't use the 64 pixel digits data set in scikit-learn but the 784 pixel MNIST images (and variants thereof).

That being said, the similarities are substantial enough to take their findings as the starting point for the evaluation of a scikit-learn implementation of a DBN, which is precisely what I have done: I also use the MNIST data set, and I use the optimal parameters (where reported) by Erhan et al. These parameters differ substantially from the ones given in the example by OP and are likely the source of the poor performance of OP's model: in particular, the layer sizes are much larger and the number of training samples is orders of magnitudes more. However, as OP, I use logistic regression in the final step of the pipeline to evaluate if the image transformations by an RBM or by a stack of RBMs/a DBN improve classification.

Incidentally, having (roughly) as many units in the RBM layers (800 units) as in the original image (784 pixels), also makes pure logistic regression on the raw image pixels a suitable benchmark model.

I hence compare the following 3 models:

logistic regression by itself (i.e. the baseline / benchmark model),
logistic regression on outputs of an RBM, and
logistic regression on outputs of a stacks of RBMs / a DBN.

Results

Consistent with the previous literature, my preliminary results indeed indicate that using the output of an RBM for logistic regression improves the performance compared to just using the raw pixel values by itself, and the DBN transformation yet improves on the RBM although the improvement is smaller.

Logistic regression by itself:

Model performance:
             precision    recall  f1-score   support

        0.0       0.95      0.97      0.96       995
        1.0       0.96      0.98      0.97      1121
        2.0       0.91      0.90      0.90      1015
        3.0       0.90      0.89      0.89      1033
        4.0       0.93      0.92      0.92       976
        5.0       0.90      0.88      0.89       884
        6.0       0.94      0.94      0.94       999
        7.0       0.92      0.93      0.93      1034
        8.0       0.89      0.87      0.88       923
        9.0       0.89      0.90      0.89      1020

avg / total       0.92      0.92      0.92     10000

Logistic regression on outputs of an RBM:

Model performance:
             precision    recall  f1-score   support

        0.0       0.98      0.98      0.98       995
        1.0       0.98      0.99      0.99      1121
        2.0       0.95      0.97      0.96      1015
        3.0       0.97      0.96      0.96      1033
        4.0       0.98      0.97      0.97       976
        5.0       0.97      0.96      0.96       884
        6.0       0.98      0.98      0.98       999
        7.0       0.96      0.97      0.97      1034
        8.0       0.96      0.94      0.95       923
        9.0       0.96      0.96      0.96      1020

avg / total       0.97      0.97      0.97     10000

Logistic regression on outputs of a stacks of RBMs / a DBN:

Model performance:
             precision    recall  f1-score   support

        0.0       0.99      0.99      0.99       995
        1.0       0.99      0.99      0.99      1121
        2.0       0.97      0.98      0.98      1015
        3.0       0.98      0.97      0.97      1033
        4.0       0.98      0.97      0.98       976
        5.0       0.96      0.97      0.97       884
        6.0       0.99      0.98      0.98       999
        7.0       0.98      0.98      0.98      1034
        8.0       0.98      0.97      0.97       923
        9.0       0.96      0.97      0.96      1020

avg / total       0.98      0.98      0.98     10000

Code

#!/usr/bin/env python

"""
Using MNIST, compare classification performance of:
1) logistic regression by itself,
2) logistic regression on outputs of an RBM, and
3) logistic regression on outputs of a stacks of RBMs / a DBN.
"""

import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import fetch_mldata
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import BernoulliRBM
from sklearn.base import clone
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report


def norm(arr):
    arr = arr.astype(np.float)
    arr -= arr.min()
    arr /= arr.max()
    return arr


if __name__ == '__main__':

    # load MNIST data set
    mnist = fetch_mldata('MNIST original')
    X, Y = mnist.data, mnist.target

    # normalize inputs to 0-1 range
    X = norm(X)

    # split into train, validation, and test data sets
    X_train, X_test, Y_train, Y_test = train_test_split(X,       Y,       test_size=10000, random_state=0)
    X_train, X_val,  Y_train, Y_val  = train_test_split(X_train, Y_train, test_size=10000, random_state=0)

    # --------------------------------------------------------------------------------
    # set hyperparameters

    learning_rate = 0.02 # from Erhan et el. (2010): median value in grid-search
    total_units   =  800 # from Erhan et el. (2010): optimal for MNIST / only slightly worse than 1200 units when using InfiniteMNIST
    total_epochs  =   50 # from Erhan et el. (2010): optimal for MNIST
    batch_size    =  128 # seems like a representative sample; backprop literature often uses 256 or 512 samples

    C = 100. # optimum for benchmark model according to sklearn docs: https://scikit-learn.org/stable/auto_examples/neural_networks/plot_rbm_logistic_classification.html#sphx-glr-auto-examples-neural-networks-plot-rbm-logistic-classification-py)

    # TODO optimize using grid search, etc

    # --------------------------------------------------------------------------------
    # construct models

    # RBM
    rbm = BernoulliRBM(n_components=total_units, learning_rate=learning_rate, batch_size=batch_size, n_iter=total_epochs, verbose=1)

    # "output layer"
    logistic = LogisticRegression(C=C, solver='lbfgs', multi_class='multinomial', max_iter=200, verbose=1)

    models = []
    models.append(Pipeline(steps=[('logistic', clone(logistic))]))                                              # base model / benchmark
    models.append(Pipeline(steps=[('rbm1', clone(rbm)), ('logistic', clone(logistic))]))                        # single RBM
    models.append(Pipeline(steps=[('rbm1', clone(rbm)), ('rbm2', clone(rbm)), ('logistic', clone(logistic))]))  # RBM stack / DBN

    # --------------------------------------------------------------------------------
    # train and evaluate models

    for model in models:
        # train
        model.fit(X_train, Y_train)

        # evaluate using validation set
        print("Model performance:\n%s\n" % (
            classification_report(Y_val, model.predict(X_val))))

    # TODO: after parameter optimization, evaluate on test set

answered Nov 15 '22 19:11

Paul Brodersen

Related questions
                            
                                Bounding box on objects based on color python
                            
                                The conversion from csv to binary format reduces the file size abnormally
                            
                                How does numpy addition work?
                            
                                Recommended way to find the source of a query when using Django?
                            
                                How to run multiple keras programs on single gpu?
                            
                                Python difflib's ratio, quick_ratio and real_quick_ratio
                            
                                Map, filter and itertools for composing asynchronous iterators
                            
                                Providing Backwards Compatability with Python 3.6 Variable Annotations
                            
                                pymongo.errors.CursorNotFound: cursor id '…' not found at server
                            
                                TypeError with accessing to coo_matrix by index
                            
                                Python hangs after some time, Continues when pressed [Enter]
                            
                                What is the most efficient way to convert numpy arrays to Shapely Points?
                            
                                Define "__sum__" for a class
                            
                                Add / substract between matrix and vector in pytorch
                            
                                How to change the transparency/opaqueness of a Matplotlib Table?
                            
                                Reading from a temporary directory in python: TypeError: expected str, bytes or os.PathLike object, not TemporaryDirectory
                            
                                What does TensorFlow shape (?,) mean?
                            
                                Pdb go to a frame in exception within exception
                            
                                python: return exception from function
                            
                                Why are properties class attributes in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With