Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

non-negative matrix factorization failing to converge

I'm trying to implement non-negative matrix factorization using the Kullback-Liebler divergence as a similarity measure. The algorithm is described in: http://hebb.mit.edu/people/seung/papers/nmfconverge.pdf. Below is my python / numpy implementation, with an example matrix to run it on.

In a nutshell, the algorithm is supposed to learn matrices W(n by r) and H(r by m) such that V(n by m) is approximately WH. You start with random values in W and H, and by following the update rules described in the Seung and Lee paper, you're supposed to get closer and closer to good approximations for W and H.

The algorithm is proven to monotonically reduce the divergence measure, but that's not what happens in my implementation. Instead, it settles into an alternation between two divergence values. If you look at W and H, you can see that the resulting factorization is not particularly good.

I've wondered whether to use the updated or old H when calculating the update for W. I tried it both ways, and it doesn't change the behavior of the implementation.

I've checked my implementation against the paper a bunch of times, and I don't see what I'm doing wrong. Can anyone shed some light on the issue?

import numpy as np

def update(V, W, H, r, n, m):
    n,m = V.shape 
    WH = W.dot(H)

    # equation (5)
    H_coeff = np.zeros(H.shape)
    for a in range(r):
        for mu in range(m):
            for i in range(n):
                H_coeff[a, mu] += W[i, a] * V[i, mu] / WH[i, mu]
            H_coeff[a, mu] /= sum(W)[a]
    H = H * H_coeff

    W_coeff = np.zeros(W.shape)
    for i in range(n):
        for a in range(r):
            for mu in range(m):
                W_coeff[i, a] += H[a, mu] * V[i, mu] / WH[i, mu]
            W_coeff[i, a] /= sum(H.T)[a]
    W = W * W_coeff

    return W, H


def factor(V, r, iterations=100):
    n, m = V.shape
    avg_V = sum(sum(V))/n/m
    W = np.random.random(n*r).reshape(n,r)*avg_V
    H = np.random.random(r*m).reshape(r,m)*avg_V

    for i in range(iterations):
        WH = W.dot(H)
        divergence = sum(sum(V * np.log(V/WH) - V + WH)) # equation (3)
        print "At iteration " + str(i) + ", the Kullback-Liebler divergence is", divergence
        W,H = update(V, W, H, r, n, m)

    return W, H


V = np.arange(0.01,1.01,0.01).reshape(10,10)

W, H = factor(V, 6)
like image 546
John Seales Avatar asked Feb 16 '23 13:02

John Seales


1 Answers

How to eliminate the alternation effect:

The very last line of the Proof of Theorem 2 reads,

By reversing the roles of H and W, the update rule for W can similarly be shown to be nonincreasing.

Thus we can surmise that updating H can be done independently of updating W. That means after updating H:

H = H * H_coeff

we should also update the intermediate value WH before updating W:

WH = W.dot(H)
W = W * W_coeff

Both updates decrease the divergence.

Try it: Just stick WH = W.dot(H) before the computation for W_coeff, and the alternation effect goes away.


Simplifying the code:

When dealing with NumPy arrays, use their mean and sum methods, and avoid using the Python sum function:

avg_V = sum(sum(V))/n/m

can be written as

avg_V = V.mean()

and

divergence = sum(sum(V * np.log(V/WH) - V + WH)) # equation (3)

can be written as

divergence = ((V * np.log(V_over_WH)) - V + WH).sum() 

Avoid the Python builtin sum function because

  • it is slower than the NumPy sum method, and
  • it is not as versatile as the NumPy sum method. (It does not allow you to specify the axis on which to sum. We managed to eliminate two calls to Python's sum with one call to NumPy's sum or mean method.)

Eliminate the triple for-loop:

But a bigger improvement in both speed and readability can be had by replacing

H_coeff = np.zeros(H.shape)
for a in range(r):
    for mu in range(m):
        for i in range(n):
            H_coeff[a, mu] += W[i, a] * V[i, mu] / WH[i, mu]
        H_coeff[a, mu] /= sum(W)[a]
H = H * H_coeff

with

V_over_WH = V/WH
H *= (np.dot(V_over_WH.T, W) / W.sum(axis=0)).T

Explanation:

If you look at the equation 5 update rule for H, first notice that indices for V and (W H) are identical. So you can replace V / (W H) with

V_over_WH = V/WH

Next, note that in the numerator we are summing over the index i, which is the first index in both W and V_over_WH. We can express that as matrix multiplication:

np.dot(V_over_WH.T, W).T

And the denominator is simply:

W.sum(axis=0).T

If we divide the numerator and denominator

(np.dot(V_over_WH.T, W) / W.sum(axis=0)).T

we get a matrix indexed by the two remaining indices, alpha and mu, in that order. That is the same as the indices for H. So we want to multiply H by this ratio element-wise. Perfect. NumPy multiplies arrays element-wise by default.

Thus, we can express the entire update rule for H as

H *= (np.dot(V_over_WH.T, W) / W.sum(axis=0)).T

So, putting it all together:

import numpy as np
np.random.seed(1)


def update(V, W, H, WH, V_over_WH):
    # equation (5)
    H *= (np.dot(V_over_WH.T, W) / W.sum(axis=0)).T

    WH = W.dot(H)
    V_over_WH = V / WH
    W *= np.dot(V_over_WH, H.T) / H.sum(axis=1)

    WH = W.dot(H)
    V_over_WH = V / WH
    return W, H, WH, V_over_WH


def factor(V, r, iterations=100):
    n, m = V.shape
    avg_V = V.mean()
    W = np.random.random(n * r).reshape(n, r) * avg_V
    H = np.random.random(r * m).reshape(r, m) * avg_V
    WH = W.dot(H)
    V_over_WH = V / WH

    for i in range(iterations):
        W, H, WH, V_over_WH = update(V, W, H, WH, V_over_WH)
        # equation (3)
        divergence = ((V * np.log(V_over_WH)) - V + WH).sum()
        print("At iteration {i}, the Kullback-Liebler divergence is {d}".format(
            i=i, d=divergence))
    return W, H

V = np.arange(0.01, 1.01, 0.01).reshape(10, 10)
# V = np.arange(1,101).reshape(10,10).astype('float')
W, H = factor(V, 6)
like image 175
unutbu Avatar answered Feb 27 '23 09:02

unutbu