Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is tensorflow.python.data.ops.dataset_ops._OptionsDataset?

I am using the Transformer code from tensorflow - https://www.tensorflow.org/beta/tutorials/text/transformer

In this code, the dataset used is loaded like this -

examples, metadata = tfds.load('ted_hrlr_translate/pt_to_en', with_info=True,
                               as_supervised=True)
train_examples, val_examples = examples['train'], examples['validation']

When I check the type of train_examples using :

type(train_examples)

I get the following as output -

tensorflow.python.data.ops.dataset_ops._OptionsDataset

Now I just wanted to change some entries of the dataset that is the sentences, but I am not able to as I don't understand the type.

I am able to iterate over it using :

for data in train_examples:
    print(data,type(data))

And type of data is -

<class 'tuple'>

Finally what I want is to replace some of these tuples with my own data. Can someone tell me how to do this or give me some details about this type tensorflow.python.data.ops.dataset_ops._OptionsDataset.

like image 956
Madhuparna Bhowmik Avatar asked Jun 29 '19 20:06

Madhuparna Bhowmik


1 Answers

tensorflow.python.data.ops.dataset_ops._OptionsDataset is just another class extending the base class tf.compat.v2.data.Dataset (DatasetV2) which holds tf.data.Options along with the original tf.compat.v2.data.Dataset dataset (The Portuguese-English tuples in your case).

(tf.data.Options operates when you are using streaming functions over your dataset tf.data.Dataset.map or tf.data.Dataset.interleave)

How to view the individual elements?

I'm sure there are many ways, but one straight way would be to use the iterator in the base class:

Since examples['train'] is a type of _OptionsDataset here is iterating by calling a method from tf.compat.v2.data.Dataset

iterator = examples['train'].__iter__()
next_element = iterator.get_next()
pt = next_element[0]
en = next_element[1]
print(pt.numpy())
print(en.numpy())

Here is the output:

b'o problema \xc3\xa9 que nunca vivi l\xc3\xa1 um \xc3\xbanico dia .'
b"except , i 've never lived one day of my life there ."

Substituting with your own data:

Since you've not mentioned what you want to substitute the original dataset with, I'll assume you have a CSV/TSV file of your own specific translations. Then it should be useful to create a separate tf.compat.v2.data.Dataset object itself by calling the CSV API to read your CSV file into a dataset:

tf.data.experimental.make_csv_dataset

https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/load_data/csv.ipynb

like image 141
Caxton Avatar answered Oct 07 '22 02:10

Caxton