I have a hdf5 training dataset with size <code>(21760, 1, 33, 33)</code>. <code>21760</code> is the whole number of training samples. I want to use the mini-batch training data with the size <code>128</code> to train the network. I want to ask: How to feed <code>128</code> mini-batch training data from the whole dataset with tensorflow each time?

You can read the hdf5 dataset into a numpy array, and feed slices of the numpy array to the TensorFlow model. Pseudo code like the following would work : <pre class="prettyprint"><code>import numpy, h5py f = h5py.File('somefile.h5','r') data = f.get('path/to/my/dataset') data_as_array = numpy.array(data) for i in range(0, 21760, 128): sess.run(train_op, feed_dict={input:data_as_array[i:i+128, :, :, :]}) </code></pre>

how to read batches in one hdf5 data file for training?

2 Answers

If your data set is so large that it can't be imported into memory like keveman suggested, you can use the h5py object directly:

import h5py
import tensorflow as tf

data = h5py.File('myfile.h5py', 'r')
data_size = data['data_set'].shape[0]
batch_size = 128
sess = tf.Session()
train_op = # tf.something_useful()
input = # tf.placeholder or something
for i in range(0, data_size, batch_size):
    current_data = data['data_set'][position:position+batch_size]
    sess.run(train_op, feed_dict={input: current_data})

You can also run through a huge number of iterations and randomly select a batch if you want to:

import random
for i in range(iterations):
    pos = random.randint(0, int(data_size/batch_size)-1) * batch_size
    current_data = data['data_set'][pos:pos+batch_size]
    sess.run(train_op, feed_dict={inputs=current_data})

Or sequentially:

for i in range(iterations):
    pos = (i % int(data_size / batch_size)) * batch_size
    current_data = data['data_set'][pos:pos+batch_size]
    sess.run(train_op, feed_dict={inputs=current_data})

You probably want to write some more sophisticated code that goes through all data randomly, but keeps track of which batches have been used, so you don't use any batch more often than others. Once you've done a full run through the training set you enable all batches again and repeat.

138

answered Oct 27 '22 22:10

alkanen

You can read the hdf5 dataset into a numpy array, and feed slices of the numpy array to the TensorFlow model. Pseudo code like the following would work :

import numpy, h5py
f = h5py.File('somefile.h5','r')
data = f.get('path/to/my/dataset')
data_as_array = numpy.array(data)
for i in range(0, 21760, 128):
  sess.run(train_op, feed_dict={input:data_as_array[i:i+128, :, :, :]})

answered Oct 27 '22 21:10

keveman

Related questions
                            
                                Auto increament the invoice number in django backend for new invoice
                            
                                How to append to Python list in a dict without having to initialize the list?
                            
                                Error Installing scikit-learn
                            
                                Get rid of grey background in python matplotlib bar chart
                            
                                Why does running my Python script start taking a screenshot? [closed]
                            
                                Pythonic way to store top 10 results
                            
                                Split by regex without resulting empty strings in Python [duplicate]
                            
                                How can i obtain a domain name with Scrapy?
                            
                                Converting decimal time (HH.HHH) into HH:MM:SS in Python
                            
                                Override Django cache settings in tests
                            
                                Which columns are binary in a Pandas DataFrame?
                            
                                In Peewee I have a datetime field defaulted to datetime.datetime.now(). But when inserted, it takes the time the server was started. Why
                            
                                how to use python's any
                            
                                execute function after click OK (QDialogButtonBox)
                            
                                Linear Regression with quadratic terms
                            
                                Get most significant digit in python
                            
                                In Python 3, how can I tell if Windows is locked?
                            
                                Python method which computes the result only on the first call, without extra parameters
                            
                                WinRM - the specified credentials were rejected by the server
                            
                                Cannot read property error using d3.js

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to read batches in one hdf5 data file for training?

Tags:

python

tensorflow

deep-learning

karl_TUM

People also ask

2 Answers

alkanen

keveman

Recent Activity

Donate For Us