Tensorflow Dataset.from_tensor_slices taking too long

Question

I have the following code:

data = np.load("data.npy")
print(data) # Makes sure the array gets loaded in memory
dataset = tf.contrib.data.Dataset.from_tensor_slices((data))

The file "data.npy" is 3.3 GB. Reading the file with numpy takes a couple of seconds but then the next line that creates the tensorflow dataset object takes ages to execute. Why is that? What is it doing under the hood?

Julio Daniel Reyes · Accepted Answer

Quoting this answer:

np.load of a npz just returns a file loader, not the actual data. It's a 'lazy loader', loading the particular array only when accessed.

That is why it is fast.

Edit 1: to expand a bit more this answer, another quote from tensorflow's documentation:

If all of your input data fit in memory, the simplest way to create a Dataset from them is to convert them to tf.Tensor objects and use Dataset.from_tensor_slices().

This works well for a small dataset, but wastes memory---because the contents of the array will be copied multiple times---and can run into the 2GB limit for the tf.GraphDef protocol buffer.

The link also shows how to do it a efficiently.

Tensorflow Dataset.from_tensor_slices taking too long

Tags:

python

numpy

tensorflow

tensorflow-datasets

niko

1 Answers

Julio Daniel Reyes

Recent Activity

Donate For Us

Tensorflow Dataset.from_tensor_slices taking too long

Tags:

python

numpy

tensorflow

tensorflow-datasets

niko

1 Answers

Julio Daniel Reyes

Related questions

Recent Activity

Donate For Us