I'm trying to implement non-negative matrix factorization using the Kullback-Liebler divergence as a similarity measure. The algorithm is described in: http://hebb.mit.edu/people/seung/papers/nmfconverge.pdf. Below is my python / numpy implementation, with an example matrix to run it on.
In a nutshell, the algorithm is supposed to learn matrices W(n by r) and H(r by m) such that V(n by m) is approximately WH. You start with random values in W and H, and by following the update rules described in the Seung and Lee paper, you're supposed to get closer and closer to good approximations for W and H.
The algorithm is proven to monotonically reduce the divergence measure, but that's not what happens in my implementation. Instead, it settles into an alternation between two divergence values. If you look at W and H, you can see that the resulting factorization is not particularly good.
I've wondered whether to use the updated or old H when calculating the update for W. I tried it both ways, and it doesn't change the behavior of the implementation.
I've checked my implementation against the paper a bunch of times, and I don't see what I'm doing wrong. Can anyone shed some light on the issue?
import numpy as np
def update(V, W, H, r, n, m):
n,m = V.shape
WH = W.dot(H)
# equation (5)
H_coeff = np.zeros(H.shape)
for a in range(r):
for mu in range(m):
for i in range(n):
H_coeff[a, mu] += W[i, a] * V[i, mu] / WH[i, mu]
H_coeff[a, mu] /= sum(W)[a]
H = H * H_coeff
W_coeff = np.zeros(W.shape)
for i in range(n):
for a in range(r):
for mu in range(m):
W_coeff[i, a] += H[a, mu] * V[i, mu] / WH[i, mu]
W_coeff[i, a] /= sum(H.T)[a]
W = W * W_coeff
return W, H
def factor(V, r, iterations=100):
n, m = V.shape
avg_V = sum(sum(V))/n/m
W = np.random.random(n*r).reshape(n,r)*avg_V
H = np.random.random(r*m).reshape(r,m)*avg_V
for i in range(iterations):
WH = W.dot(H)
divergence = sum(sum(V * np.log(V/WH) - V + WH)) # equation (3)
print "At iteration " + str(i) + ", the Kullback-Liebler divergence is", divergence
W,H = update(V, W, H, r, n, m)
return W, H
V = np.arange(0.01,1.01,0.01).reshape(10,10)
W, H = factor(V, 6)
How to eliminate the alternation effect:
The very last line of the Proof of Theorem 2 reads,
By reversing the roles of H and W, the update rule for W can similarly be shown to be nonincreasing.
Thus we can surmise that updating H
can be done independently of updating W
. That means after updating H
:
H = H * H_coeff
we should also update the intermediate value WH
before updating W
:
WH = W.dot(H)
W = W * W_coeff
Both updates decrease the divergence.
Try it: Just stick WH = W.dot(H)
before the computation for W_coeff
, and the alternation effect goes away.
Simplifying the code:
When dealing with NumPy arrays, use their mean
and sum
methods, and avoid using the Python sum
function:
avg_V = sum(sum(V))/n/m
can be written as
avg_V = V.mean()
and
divergence = sum(sum(V * np.log(V/WH) - V + WH)) # equation (3)
can be written as
divergence = ((V * np.log(V_over_WH)) - V + WH).sum()
Avoid the Python builtin sum
function because
sum
method, andsum
method. (It
does not allow you to specify the axis on which to sum. We managed to eliminate two calls to Python's sum
with one call to NumPy's sum
or mean
method.)Eliminate the triple for-loop:
But a bigger improvement in both speed and readability can be had by replacing
H_coeff = np.zeros(H.shape)
for a in range(r):
for mu in range(m):
for i in range(n):
H_coeff[a, mu] += W[i, a] * V[i, mu] / WH[i, mu]
H_coeff[a, mu] /= sum(W)[a]
H = H * H_coeff
with
V_over_WH = V/WH
H *= (np.dot(V_over_WH.T, W) / W.sum(axis=0)).T
Explanation:
If you look at the equation 5 update rule for H
, first notice that indices for V
and (W H)
are identical. So you can replace V / (W H)
with
V_over_WH = V/WH
Next, note that in the numerator we are summing over the index i, which is the first index in both W
and V_over_WH
. We can express that as matrix multiplication:
np.dot(V_over_WH.T, W).T
And the denominator is simply:
W.sum(axis=0).T
If we divide the numerator and denominator
(np.dot(V_over_WH.T, W) / W.sum(axis=0)).T
we get a matrix indexed by the two remaining indices, alpha and mu, in that order. That is the same as the indices for H
. So we want to multiply H by this ratio element-wise. Perfect. NumPy multiplies arrays element-wise by default.
Thus, we can express the entire update rule for H
as
H *= (np.dot(V_over_WH.T, W) / W.sum(axis=0)).T
So, putting it all together:
import numpy as np
np.random.seed(1)
def update(V, W, H, WH, V_over_WH):
# equation (5)
H *= (np.dot(V_over_WH.T, W) / W.sum(axis=0)).T
WH = W.dot(H)
V_over_WH = V / WH
W *= np.dot(V_over_WH, H.T) / H.sum(axis=1)
WH = W.dot(H)
V_over_WH = V / WH
return W, H, WH, V_over_WH
def factor(V, r, iterations=100):
n, m = V.shape
avg_V = V.mean()
W = np.random.random(n * r).reshape(n, r) * avg_V
H = np.random.random(r * m).reshape(r, m) * avg_V
WH = W.dot(H)
V_over_WH = V / WH
for i in range(iterations):
W, H, WH, V_over_WH = update(V, W, H, WH, V_over_WH)
# equation (3)
divergence = ((V * np.log(V_over_WH)) - V + WH).sum()
print("At iteration {i}, the Kullback-Liebler divergence is {d}".format(
i=i, d=divergence))
return W, H
V = np.arange(0.01, 1.01, 0.01).reshape(10, 10)
# V = np.arange(1,101).reshape(10,10).astype('float')
W, H = factor(V, 6)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With