Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to build a model in MXNet using matrices and matrix operations explicitly?

Tags:

mxnet

I can create a model using the pre-build high-level functions like FullyConnected. For example:

X = mx.sym.Variable('data')
P  = mx.sym.FullyConnected(data = X, name = 'fc1', num_hidden = 2)

In this way I get a symbolic variable P that is dependent on the symbolic variable X. In other words, I have computational graph that can be used to define a model and execute such operations as fit and predict.

Now, I would like to express P through X in a different way. In more detail, instead of using the high-level functionality (like FullyConnected), I would like to specify relations between P and X "explicitly", using low-level tensor operations (like matrix multiplication) and symbolic variables representing model parameters (lake weight matrix).

For example to achieve the same as above, I have tried the followig:

W = mx.sym.Variable('W')
B = mx.sym.Variable('B')
P = mx.sym.broadcast_plus(mx.sym.dot(X, W), B)

However, P obtained this way is not equivalent to P obtained earlier. I cannot use it the same way. In particular, as far as I understand, MXNet is complaining that W and B do not have values (which makes sense).

I have also tried to declare W and B in another way (so that they do have values):

w = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
b = np.array([7.0, 8.0])

W = mx.nd.array(w)
B = mx.nd.array(b)

It does not work as well. I guess that MXNet complains because it expects a symbolic variable but it gets nd-arrays instead.

So, my question is how to build a model using low-level tensor operations (like matrix multiplication) and explicit objects representing model parameters (like weight matrices).

like image 894
Roman Avatar asked Dec 06 '17 10:12

Roman


1 Answers

You might want to take a look at Gluon API. For example here is a guide for building MLP from scratch, including allocating the parameters:

#######################
#  Allocate parameters for the first hidden layer
#######################
W1 = nd.random_normal(shape=(num_inputs, num_hidden), scale=weight_scale, ctx=model_ctx)
b1 = nd.random_normal(shape=num_hidden, scale=weight_scale, ctx=model_ctx)

params = [W1, b1, ...]

Attaching them to the automatic gradient

for param in params:
    param.attach_grad()

Define the model:

def net(X):
    #######################
    #  Compute the first hidden layer
    #######################
    h1_linear = nd.dot(X, W1) + b1
    ...

and execute it

epochs = 10
learning_rate = .001
smoothing_constant = .01

for e in range(epochs):
    ...
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(model_ctx).reshape((-1, 784))
        label = label.as_in_context(model_ctx)
        ...
        with autograd.record():
            output = net(data)
            loss = softmax_cross_entropy(output, label_one_hot)
        loss.backward()
        SGD(params, learning_rate)

You can see the full example in the straight dope:

http://gluon.mxnet.io/chapter03_deep-neural-networks/mlp-scratch.html

like image 112
Guy Avatar answered Nov 18 '22 16:11

Guy