Confused about tensor dimensions and batch sizes in pytorch

So I'm very new to PyTorch and Neural Networks in general, and I'm having some problems creating a Neural Network that classifies names by gender.
I based this off of the PyTorch tutorial for RNNs that classify names by nationality, but I decided not to go with a recurrent approach... Stop me right here if this was the wrong idea!
However, whenever I try to run an input through the network it tells me:

RuntimeError: matrices expected, got 3D, 2D tensors at /py/conda-bld/pytorch_1493681908901/work/torch/lib/TH/generic/THTensorMath.c:1232

I know this has something to do with how PyTorch always expects there to be a batch size or something, and I have my tensor set up that way, but you can probably tell by this point that I have no idea what I'm talking about. Here's my code:

from future import unicode_literals, print_function, division
from io import open
import glob
import unicodedata
import string
import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
import random
from torch.autograd import Variable
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

"""------GLOBAL VARIABLES------"""

all_letters = string.ascii_letters + " .,;'"
num_letters = len(all_letters)
all_names = {}
genders = ["Female", "Male"]

"""-------DATA EXTRACTION------"""

def findFiles(path):
    return glob.glob(path)

def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
        and c in all_letters

# Read a file and split into lines
def readLines(filename):
    lines = open(filename, encoding='utf-8').read().strip().split('\n')
    return [unicodeToAscii(line) for line in lines]

for file in findFiles("/home/andrew/PyCharm/PycharmProjects/CantStop/data/names/*.txt"):
    gender = file.split("/")[-1].split(".")[0]
    names = readLines(file)
    all_names[gender] = names


def nameToTensor(name):
    tensor = torch.zeros(len(name), 1, num_letters)
    for index, letter in enumerate(name):
        tensor[index][0][all_letters.find(letter)] = 1
    return tensor

def outputToGender(output):
    gender, gender_index = output.data.topk(1)
    if gender_index[0][0] == 0:
        return "Female"
    return "Male"

"""------NETWORK SETUP------"""

class Net(nn.Module):
    def __init__(self, input_size, output_size):
        super(Net, self).__init__()
        #Layer 1
        self.Lin1 = nn.Linear(input_size, int(input_size/2))
        self.ReLu1 = nn.ReLU()
        self.Batch1 = nn.BatchNorm1d(int(input_size/2))
        #Layer 2
        self.Lin2 = nn.Linear(int(input_size/2), output_size)
        self.ReLu2 = nn.ReLU()
        self.Batch2 = nn.BatchNorm1d(output_size)
        self.softMax = nn.LogSoftmax()

    def forward(self, input):
        output1 = self.Batch1(self.ReLu1(self.Lin1(input)))
        output2 = self.softMax(self.Batch2(self.ReLu2(self.Lin2(output1))))
        return output2

NN = Net(num_letters, 2)


def getRandomTrainingEx():
    gender = genders[random.randint(0, 1)]
    name = all_names[gender][random.randint(0, len(all_names[gender])-1)]
    gender_tensor = Variable(torch.LongTensor([genders.index(gender)]))
    name_tensor = Variable(nameToTensor(name))
    return gender_tensor, name_tensor, gender

def train(input, target):
    loss_func = nn.NLLLoss()

    optimizer = optim.SGD(NN.parameters(), lr=0.0001, momentum=0.9)


    output = NN(input)

    loss = loss_func(output, target)

    return output, loss

all_losses = []
current_loss = 0

for i in range(100000):
    gender_tensor, name_tensor, gender = getRandomTrainingEx()
    output, loss = train(name_tensor, gender_tensor)
    current_loss += loss

    if i%1000 == 0:
        print("Guess: %s, Correct: %s, Loss: %s" % (outputToGender(output), gender, loss.data[0]))

    if i%100 == 0:
        current_loss = 0

# plt.figure()
# plt.plot(all_losses)
# plt.show()

Please help a newbie out!

1 Answers

  1. Debugging your bug out:

Pycharm is a helpful python debugger that let you set breakpoint and views dimension of your tensor.
For easier debug, do not stack forward thing up like that

output1 = self.Batch1(self.ReLu1(self.Lin1(input)))


h1 = self.ReLu1(self.Lin1(input))
h2 = self.Batch1(h1)

For the stacktrace, Pytorch also provide Pythonic error stacktrack. I believe that before

RuntimeError: matrices expected, got 3D, 2D tensors at /py/conda-bld/pytorch_1493681908901/work/torch/lib/TH/generic/THTensorMath.c:1232

There are some python error stacktrace that point right into your code. For easier debug, as I said, don't stack forward.

You use Pycharm to create break point before crash point. In debugger watcher Then use Variable(torch.rand(dim1, dim2)) to test out forward pass input, output dimension, and if a dimension is incorrect. Comparing with dimension of input. Call input.size() in debugger watcher.

For example, self.ReLu1(self.Lin1(Variable(torch.rand(10, 20)))).size() . If it show read text (error), then the input dimension is incorrect. Else, it show the size of the output.


  1. Read the docs

In Pytorch Docs, it specify input/output dimension. It also have a example code snip

>>> rnn = nn.RNN(10, 20, 2)
>>> input = Variable(torch.randn(5, 3, 10))
>>> h0 = Variable(torch.randn(2, 3, 20))
>>> output, hn = rnn(input, h0)

You may use the code snip in PyCharm Debugger to explore dimension of input, output of specific layer of your interest (RNN, Linear, BatchNorm1d).

