I am using the Transformer code from tensorflow - https://www.tensorflow.org/beta/tutorials/text/transformer
In this code, the dataset used is loaded like this -
examples, metadata = tfds.load('ted_hrlr_translate/pt_to_en', with_info=True,
as_supervised=True)
train_examples, val_examples = examples['train'], examples['validation']
When I check the type of train_examples using :
type(train_examples)
I get the following as output -
tensorflow.python.data.ops.dataset_ops._OptionsDataset
Now I just wanted to change some entries of the dataset that is the sentences, but I am not able to as I don't understand the type.
I am able to iterate over it using :
for data in train_examples:
print(data,type(data))
And type of data is -
<class 'tuple'>
Finally what I want is to replace some of these tuples with my own data.
Can someone tell me how to do this or give me some details about this type
tensorflow.python.data.ops.dataset_ops._OptionsDataset
.
tensorflow.python.data.ops.dataset_ops._OptionsDataset
is just another class extending the base class tf.compat.v2.data.Dataset
(DatasetV2) which holds tf.data.Options
along with the original tf.compat.v2.data.Dataset
dataset (The Portuguese-English tuples in your case).
(tf.data.Options
operates when you are using streaming functions over your dataset tf.data.Dataset.map
or tf.data.Dataset.interleave
)
How to view the individual elements?
I'm sure there are many ways, but one straight way would be to use the iterator in the base class:
Since examples['train']
is a type of _OptionsDataset
here is iterating by calling a method from
tf.compat.v2.data.Dataset
iterator = examples['train'].__iter__()
next_element = iterator.get_next()
pt = next_element[0]
en = next_element[1]
print(pt.numpy())
print(en.numpy())
Here is the output:
b'o problema \xc3\xa9 que nunca vivi l\xc3\xa1 um \xc3\xbanico dia .'
b"except , i 've never lived one day of my life there ."
Substituting with your own data:
Since you've not mentioned what you want to substitute the original dataset with, I'll assume you have a CSV/TSV file of your own specific translations. Then it should be useful to create a separate tf.compat.v2.data.Dataset
object itself by calling the CSV API to read your CSV file into a dataset:
tf.data.experimental.make_csv_dataset
https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/load_data/csv.ipynb
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With