Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this a practical way to resolve 'Not enough memory' from LuaJit with Torch

StanfordNLP's TreeLSTM, when used with a dataset with > 30K instances, causes LuaJit to error with "Not Enough Memory." I am resolving this by using LuaJit Data Structures. In order to get the dataset outside of lua's heap, the trees need to be placed in a LDS.Vector.

Since the LDS.Vector holds cdata, the first step was to make the Tree type into a cdata object:

local ffi = require('ffi')

ffi.cdef([[
typedef struct CTree {
   struct CTree* parent;
   int num_children;
   struct CTree* children [25];
   int idx;
   int gold_label;
   int leaf_idx;
} CTree;
]])

There are also small changes that need to be made in read_data.lua to handle the new cdata CTree type. Using LDS seemed like a reasonable approach to solve the memory limit so far; however, the CTree requires a field named 'composer'.

Composer is of the type nn.gModule. To continue with this solution would involve creating a typedef of the nn.gModule as cdata, including creating a typedef for its members. Before continuing, does this seem like the correct direction to follow? Does any one have experience with this problem?

like image 381
user2827214 Avatar asked Jul 30 '15 17:07

user2827214


1 Answers

As you've discovered, representing structured data in a LuaJIT heap-friendly manner is a bit of a pain at the moment.

In the Tree-LSTM implementation, the tree tables each hold a pointer to a composer instance mainly for expediency in implementation.

One workaround to avoid typedef-ing nn.gModule would be to use the existing idx field to index into a table of composer instances. In this approach, the pair (sentence_idx, node_idx) can be used uniquely identify a composer in a global two-level table of composer instances. To avoid memory issues, the current cleanup code can be replaced with a line that sets the corresponding index in the table to nil.

like image 134
kst Avatar answered Sep 29 '22 12:09

kst