Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing Adagrad in Python

I'm trying to implement Adagrad in Python. For learning purposes, I am using matrix factorisation as an example. I'd be using Autograd for computing the gradients.

My main question is if the implementation is fine.

Problem description

Given a matrix A (M x N) having some missing entries, decompose into W and H having sizes (M x k) and (k X N) respectively. Goal would to learn W and H using Adagrad. I'd be following this guide for the Autograd implementation.

NB: I very well know that ALS based implementation are well-suited. I'm using Adagrad only for learning purposes

Customary imports

import autograd.numpy as np
import pandas as pd

Creating the matrix to be decomposed

A = np.array([[3, 4, 5, 2],
                   [4, 4, 3, 3],
                   [5, 5, 4, 3]], dtype=np.float32).T

Masking one entry

A[0, 0] = np.NAN

Defining the cost function

def cost(W, H):
    pred = np.dot(W, H)
    mask = ~np.isnan(A)
    return np.sqrt(((pred - A)[mask].flatten() ** 2).mean(axis=None))

Decomposition params

rank = 2
learning_rate=0.01
n_steps = 10000

Gradient of cost wrt params W and H

from autograd import grad, multigrad
grad_cost= multigrad(cost, argnums=[0,1])

Main Adagrad routine (this needs to be checked)

shape = A.shape

# Initialising W and H
H =  np.abs(np.random.randn(rank, shape[1]))
W =  np.abs(np.random.randn(shape[0], rank))

# gt_w and gt_h contain accumulation of sum of gradients
gt_w = np.zeros_like(W)
gt_h = np.zeros_like(H)

# stability factor
eps = 1e-8
print "Iteration, Cost"
for i in range(n_steps):

    if i%1000==0:
        print "*"*20
        print i,",", cost(W, H)

    # computing grad. wrt W and H
    del_W, del_H = grad_cost(W, H)

    # Adding square of gradient
    gt_w+= np.square(del_W)
    gt_h+= np.square(del_H)

    # modified learning rate
    mod_learning_rate_W = np.divide(learning_rate, np.sqrt(gt_w+eps))
    mod_learning_rate_H = np.divide(learning_rate, np.sqrt(gt_h+eps))
    W =  W-del_W*mod_learning_rate_W
    H =  H-del_H*mod_learning_rate_H

While the problem converges and I get a reasonable solution, I was wondering if the implementation is correct. Specifically, if the understanding of sum of gradients and then computing the adaptive learning rate is correct or not?

like image 288
Nipun Batra Avatar asked Jun 07 '17 06:06

Nipun Batra


1 Answers

At a cursory glance, your code closely matches that at https://github.com/benbo/adagrad/blob/master/adagrad.py

del_W, del_H = grad_cost(W, H)

matches

grad=f_grad(w,sd,*args)
gt_w+= np.square(del_W)
gt_h+= np.square(del_H)

matches

gti+=grad**2
mod_learning_rate_W = np.divide(learning_rate, np.sqrt(gt_w+eps))
mod_learning_rate_H = np.divide(learning_rate, np.sqrt(gt_h+eps))

matches

adjusted_grad = grad / (fudge_factor + np.sqrt(gti))
W =  W-del_W*mod_learning_rate_W
H =  H-del_H*mod_learning_rate_H

matches

w = w - stepsize*adjusted_grad

So, assuming that adagrad.py is correct and the translation is correct, would make your code correct. (consensus does not prove your code to be right, but it could be a hint)

like image 143
serv-inc Avatar answered Oct 05 '22 02:10

serv-inc