I want to have a nitty-gritty understanding of Restricted Boltzman Machines with continuous input variables. I am trying to devise the most trivial possible example, so that the behavior could be easily tracked. So, here it is.
The input data is two-dimensional. Each data point is drawn from one of two symmetrical normal distributions (sigma = 0.03), whose centers are well spaced (15 times sigma). The RBM has two-dimensional hidden layer.
I expected to obtain an RBM that would generate two clouds of points with the same means as in my train data. I was even thinking that after adding some sparsity constraints I would have the hidden layer equal to (0,1) for the data drawn from one distribution and (1,0) for the other.
I wrote matlab code myself and tried some online solutions (such as DeepMat: https://github.com/kyunghyuncho/deepmat), but no matter how small my step size is, RBM converges to a trivial solution, in which the predicted visible layer is equal to the mean value over entire data. I tried increasing the dimensionality of the hidden layer, but it does not change anything substantially. I also tried normalizing the data by zero mean and variance - no change. I also had sigma = 1 instead of 0.03, while keeping the spread of 15*sigma, again no change.
Since this problem is present not only in my code, but also in others', I thought that I might be doing something fundamentally wrong and trying to use RBM the way the should not be used. I would appreciate comments / suggestions, or if someone could reproduce my problem.
Have a look here for an explanation of which probability density functions over visible variables can be expressed with a Gaussian-Bernoulli RBM. The following picture gives an illustration, where b is the visible bias and w1 and w2 are the weight vectors associated to the hidden units.
Click for the image, as I need more reputation to post it directly ...
You see that the RBM models a Gaussian Mixture Model with 2^H components, where the mean of each component is a superposition of the visible bias and the weight vectors associated with a subset of the hidden units. The weight of each component relates to the bias of the hidden units that are in this subset.
That said, your problem of modeling a mixture of two Gaussians can be best represented with an RBM with just a single hidden unit, where the visible bias equals the mean of one component and the sum of visible bias and the weight vector of the hidden unit equals to the mean of the second mixture component. When your RBM has two hidden units, things get more complicated as this RBM models a Gaussian mixture with 4 components.
And even if your RBM has only one hidden unit, learning a Gaussian mixture where the two components are far apart is likely to fail when using learning strategies like contrastive divergence and poorly initialized weights and biases.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With