I am trying to design an input pipeline with Dataset API. I am working with parquet files. What is a good way to add them to my pipeline?

We have released Petastorm, an open source library that allows you to use Apache Parquet files directly via Tensorflow Dataset API. Here is a small example: <pre class="prettyprint"><code> with Reader('hdfs://.../some/hdfs/path') as reader: dataset = make_petastorm_dataset(reader) iterator = dataset.make_one_shot_iterator() tensor = iterator.get_next() with tf.Session() as sess: sample = sess.run(tensor) print(sample.id) </code></pre>

Tensorflow Dataset API: input pipeline with parquet files

1 Answers

We have released Petastorm, an open source library that allows you to use Apache Parquet files directly via Tensorflow Dataset API.

Here is a small example:

   with Reader('hdfs://.../some/hdfs/path') as reader:
        dataset = make_petastorm_dataset(reader)
        iterator = dataset.make_one_shot_iterator()
        tensor = iterator.get_next()
        with tf.Session() as sess:
            sample = sess.run(tensor)
            print(sample.id)

173

answered Oct 18 '22 21:10

Yevgeni Litvin

Related questions
                            
                                How to perform max pooling on a 1-dimensional ConvNet (conv1d) in TensowFlow?
                            
                                Binary mask in Tensorflow
                            
                                Ensuring positive definite covariance matrix
                            
                                Machine Learning - Information extraction from a document [closed]
                            
                                tensorflow installation issues:ImportError: No module named tensorflow
                            
                                loss function design to incorporate different weight for false positive and false negative
                            
                                TensorFlow - Text recognition in image [closed]
                            
                                Does LSTM in Keras support dynamic sentence length or not?
                            
                                Keras: Why my val_acc suddenly drops at Epoch 42/50?
                            
                                how to create a encode_raw tensorflow function?
                            
                                Inception5h vs Inception V4, what is 5h
                            
                                How to structure Tensorflow model code?
                            
                                "unsupported/Eigen/CXX11/Tensor: No such file or directory" while working with TensorFlow
                            
                                Why do I need to initialize variables in TensorFlow?
                            
                                How to augment data in tensorflow tfrecords?
                            
                                Tensorflow can't detect GPU when invoked by Ray worker
                            
                                Converting Tensor to np.array using K.eval() in Keras returns InvalidArgumentError
                            
                                Keras ConvLSTM2D: ValueError on output layer
                            
                                Is tf.contrib.layers.fully_connected() behavior change between tensorflow 1.3 and 1.4 an issue?
                            
                                'Tensor' object has no attribute 'assign_add'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Tensorflow Dataset API: input pipeline with parquet files

Tags:

tensorflow

parquet

pipeline

Mariya Hendriksen

People also ask

1 Answers

Yevgeni Litvin

Recent Activity

Donate For Us