Tensorflow seems to lack a reader for ".npy" files. How can I read my data files into the new tensorflow.data.Dataset pipline? My data doesn't fit in memory.
Each object is saved in a separate ".npy" file. each file contains 2 different ndarrays as features and a scalar as their label.
TensorFlow implements a subset of the NumPy API, available as tf. experimental. numpy . This allows running NumPy code, accelerated by TensorFlow, while also allowing access to all of TensorFlow's APIs.
You can do it with tf.py_func, see the example here. The parse function would simply decode the filename from bytes to string and call np.load.
Update: something like this:
def read_npy_file(item): data = np.load(item.decode()) return data.astype(np.float32) file_list = ['/foo/bar.npy', '/foo/baz.npy'] dataset = tf.data.Dataset.from_tensor_slices(file_list) dataset = dataset.map( lambda item: tuple(tf.py_func(read_npy_file, [item], [tf.float32,])))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With