Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy array to TFrecord

I'm trying to train a custom dataset through tensorflow object detection api. Dataset contains 40k training images and labels which are in numpy ndarray format (uint8). training dataset shape=2 ([40000,23456]) and labels shape = 1 ([0..., 3]). I want to generate tfrecord for this dataset. how do I do that?

like image 264
Govinda Malavipathirana Avatar asked May 18 '18 17:05

Govinda Malavipathirana


People also ask

How to use numpy2tfrecordconverter to add samples to a tfrecord?

On the other hand, you can use Numpy2TFRecordConverter to sequentially add samples to the tfrecord without having to read all of them into memory beforehand.. Samples once stored in the tfrecord can be streamed using tf.data.TFRecordDataset. The dataset can then be used directly in the for-loop of machine learning.

What is the dtype of the NumPy function?

Its dtype should be float32, float64, or int64. If X has a higher rank, it should be rshape before fed to this function. Numpy array for training labels. Its dtype should be float32, float64, or int64. None if there is no label array. If true, progress is reported.

How to load and parse image data efficiently using tfrecords?

To make loading and parsing image data-efficient, we can resort to TFRecords as the underlying file format. The procedure is as follows: We first create some random images — that is, using NumPy to randomly fill a matrix of given image shape: width, height, and colour channels:

What are tfrecords and how do they work?

This is where TFRecords (or large NumPy arrays, for that matter) come in handy: Instead of storing the data scattered around, forcing the disks to jump between blocks, we simply store the data in a sequential layout. We can visualize this concept in the following way: The TFRecord file can be seen as a wrapper around all the single data samples.


1 Answers

This tutorial will walk you through the process of creating TFRecords from your data:

https://medium.com/mostly-ai/tensorflow-records-what-they-are-and-how-to-use-them-c46bc4bbb564

However there are easier ways of dealing with preprocessing now using the Dataset input pipeline. I prefer to keep my data in it's most original format and build a preprocessing pipeline to deal with it. Here's the primary guide you want to read to learn about the Dataset preprocessing pipeline:

https://www.tensorflow.org/programmers_guide/datasets

like image 72
David Parks Avatar answered Oct 03 '22 20:10

David Parks