Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow TFRecord: Can't parse serialized example

I am trying to follow this guide in order to serialize my input data into the TFRecord format but I keep hitting this error when trying to read it:

InvalidArgumentError: Key: my_key. Can't parse serialized Example.

I am not sure where I'm going wrong. Here is a minimal reproduction of the issue I cannot get past.

Serialise some sample data:

with tf.python_io.TFRecordWriter('train.tfrecords') as writer:
  for idx in range(10):
        example = tf.train.Example(
            features=tf.train.Features(
                feature={
                    'label': tf.train.Feature(int64_list=tf.train.Int64List(value=[1,2,3])),
                    'test': tf.train.Feature(float_list=tf.train.FloatList(value=[0.1,0.2,0.3])) 
                }
            )
        )

        writer.write(example.SerializeToString())
  writer.close()

Parsing function & deserialise:

def parse(tfrecord):
  features = {
      'label': tf.FixedLenFeature([], tf.int64, default_value=0),
      'test': tf.FixedLenFeature([], tf.float32, default_value=0.0),
  }
  return tf.parse_single_example(tfrecord, features)

dataset = tf.data.TFRecordDataset('train.tfrecords').map(parse)
getnext = dataset.make_one_shot_iterator().get_next()

When trying to run this:

with tf.Session() as sess:
  v = sess.run(getnext)
  print (v)

I trigger the above error message.

Is it possible to get past this error and deserialize my data?

like image 377
Stewart_R Avatar asked Nov 27 '18 12:11

Stewart_R


2 Answers

tf.FixedLenFeature() is used for reading the fixed size arrays of data. And the shape of the data should be defined beforehand. Updating the parse function to

def parse(tfrecord):
   return tf.parse_single_example(tfrecord, features={
       'label': tf.FixedLenFeature([3], tf.int64, default_value=[0,0,0]),
       'test': tf.FixedLenFeature([3], tf.float32, default_value=[0.0, 0.0, 0.0]),
   })

Should do the job.

like image 157
Vlad-HC Avatar answered Nov 15 '22 06:11

Vlad-HC


As an alternative, if your input features lengths are not fixed and are of arbitrary sizes then you can also use tf.io.FixedLenSequenceFeature() with arguments allow_missing = True and default_value=0 (in case of type int and 0.0 for float) which does not require the input feature to be of fixed size unlike tf.io.FixedLenFeature(). You can find more information here.

like image 28
Rishabh Sahrawat Avatar answered Nov 15 '22 06:11

Rishabh Sahrawat