Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use a custom data model with Deeplearning4j?

The base problem is trying to use a custom data model to create a DataSetIterator to be used in a deeplearning4j network.

The data model I am trying to work with is a java class that holds a bunch of doubles, created from quotes on a specific stock, such as timestamp, open, close, high, low, volume, technical indicator 1, technical indicator 2, etc. I query an internet source, example, (also several other indicators from the same site) which provide json strings that I convert into my data model for easier access and to store in an sqlite database.

Now I have a List of these data models that I would like to use to train an LSTM network, each double being a feature. Per the Deeplearning4j documentation and several examples, the way to use training data is to use the ETL processes described here to create a DataSetIterator which is then used by the network.

I don't see a clean way to convert my data model using any of the provided RecordReaders without first converting them to some other format, such as a CSV or other file. I would like to avoid this because it would use up a lot of resources. It seems like there would be a better way to do this simple case. Is there a better approach that I am just missing?

like image 523
Ethan Avatar asked Feb 17 '18 19:02

Ethan


2 Answers

Ethan!

First of all, Deeplearning4j uses ND4j as backend, so your data will have to eventually be converted into INDArray objects in order to be used in your model. If your trianing data is two array of doubles, inputsArray and desiredOutputsArray, you can do the following:

INDArray inputs = Nd4j.create(inputsArray, new int[]{numSamples, inputDim});
INDArray desiredOutputs = Nd4j.create(desiredOutputsArray, new int[]{numSamples, outputDim});

And then you can train your model using those vectors directly:

for (int epoch = 0; epoch < nEpochs; epoch++)
    model.fit(inputs, desiredOutputs);

Alternatively you can create a DataSet object and used it for training:

DataSet ds = new DataSet(inputs, desiredOutputs);
for (int epoch = 0; epoch < nEpochs; epoch++)
    model.fit(ds);

But creating a custom iterator is the safest approach, specially in larger sets since it gives you more control over your data and keep things organized.

In your DataSetIterator implementation you must pass your data and in the implementation of the next() method you should return a DataSet object comprising the next batch of your training data. It would look like this:

public class MyCustomIterator implements DataSetIterator {
    private INDArray inputs, desiredOutputs;
    private int itPosition = 0; // the iterator position in the set.

    public MyCustomIterator(float[] inputsArray,
                            float[] desiredOutputsArray,
                            int numSamples,
                            int inputDim,
                            int outputDim) {
        inputs = Nd4j.create(inputsArray, new int[]{numSamples, inputDim});
        desiredOutputs = Nd4j.create(desiredOutputsArray, new int[]{numSamples, outputDim});
    }

    public DataSet next(int num) {
        // get a view containing the next num samples and desired outs.
        INDArray dsInput = inputs.get(
            NDArrayIndex.interval(itPosition, itPosition + num),
            NDArrayIndex.all());
        INDArray dsDesired = desiredOutputs.get(
            NDArrayIndex.interval(itPosition, itPosition + num),
            NDArrayIndex.all());

        itPosition += num;

        return new DataSet(dsInput, dsDesired);
    }

    // implement the remaining virtual methods...

}

The NDArrayIndex methods you see above are used to access parts of a INDArray. Then now you can use it for training:

MyCustomIterator it = new MyCustomIterator(
    inputs,
    desiredOutputs,
    numSamples,
    inputDim,
    outputDim);

for (int epoch = 0; epoch < nEpochs; epoch++)
    model.fit(it);

This example will be particularly useful to you, since it implements a LSTM network and it has a custom iterator implementation (which can be a guide for implementing the remaining methods). Also, for more information on NDArray, this is helpful. It gives detailed information on creating, modifying and accessing parts of an NDArray.

like image 117
Diego Stéfano Avatar answered Sep 17 '22 15:09

Diego Stéfano


deeplearning4j creator here.

You should not in any but all very special setting create a data set iterator. You should be using datavec. We cover this in numerous places ranging from our data vec page to our examples: https://deeplearning4j.konduit.ai/datavec/overview https://github.com/eclipse/deeplearning4j-examples

Datavec is our dedicated library for doing data transformations. You create custom record readers for your use case. Deeplearning4j for legacy reasons has a few "special" iterators for certain datasets. Many of those came before datavec existed. We built datavec as a way of pre processing data.

Now you use the RecordReaderDataSetIterator, SequenceRecordReaderDataSetIterator (see our javadoc for more information) and their multi dataset equivalents.

If you do this, you don't have to worry about masking, thread safety, or anything else that involves fast loading of data.

As an aside, I would love to know where you are getting the idea to create your own iterator, we now have it right in our readme not to do that. If there's another place you were looking that is not obvious, we would love to fix that.

Edit: I've updated the links to the new pages. This post is very old now. Please see the new links here:

https://deeplearning4j.konduit.ai/datavec/overview https://github.com/eclipse/deeplearning4j-examples

like image 32
Adam Gibson Avatar answered Sep 21 '22 15:09

Adam Gibson