Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras / Tensorflow: Predict Using tf.data.Dataset API

I'm using Keras with a Tensorflow backend for building a model for this problem: https://www.kaggle.com/cfpb/us-consumer-finance-complaints (just practicing).

I train my Keras model using the tf.data.Dataset API. Now, I have a Pandas DataFrame, df_testing, whose columns are complaint (strings) and label (also strings). I want to predict on these new samples. I create a tf.data.Dataset object, perform preprocessing, make an Iterator, and call predict on my model:

data = df_testing["complaint"].values
labels = df_testing["label"].values

dataset = tf.data.Dataset.from_tensor_slices((data))
dataset = dataset.map(lambda x: ({'reviews': x}))
dataset = dataset.batch(self.batch_size).repeat()
dataset = dataset.map(lambda x: self.preprocess_text(x, self.data_table))
dataset = dataset.map(lambda x: x['reviews'])
dataset = dataset.make_initializable_iterator()

My training used a tf.data.Dataset where each element was of the form ({'reviews': "movie was great"}, "positive") so I'm mimicking that here for prediction. Also, my preprocessing just turns my string into a Tensor of integers.

When I call:

preds = model.predict(dataset)

But I'm told my predict call fails:

ValueError: When using iterators as input to a model, you should specify the `steps` argument.

So I modify this call to be:

preds = model.predict(dataset, steps=3)

But now I get back:

ValueError: Please provide data as a list or tuple of 2 elements  - input and target pair. Received Tensor("IteratorGetNext_2:0", shape=(?, 100), dtype=int32)

What am I doing incorrectly here? I shouldn't have to provide a tuple of 2 elements when predicting (I shouldn't need the label).

Thanks for any help you can offer!

like image 623
anon_swe Avatar asked Nov 19 '18 23:11

anon_swe


People also ask

What is the role of the tf data API in TensorFlow?

The tf. data API enables you to build complex input pipelines from simple, reusable pieces. For example, the pipeline for an image model might aggregate data from files in a distributed file system, apply random perturbations to each image, and merge randomly selected images into a batch for training.

What is MapDataset?

A MapDataset is a dataset that applies a transform to a source dataset. Public Types using DatasetType = SourceDataset.


2 Answers

What version of Keras are you on? I cannot find that specific error message in the code base, but I think I found where it used to be.

Here's the error in a version of the code that I think is close to the version you're running: commit

And here's the updated version of that error: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training_eager.py#L464

The conditions of the input validation have changed (in the newest version your input would be accepted), but what's relevant is that the error message is much more clear:

raise ValueError(
    'Please provide data as a list or tuple of 1, 2, or 3 elements '
    ' - `(input)`, or `(input, target)`, or `(input, target,'
    'sample_weights)`. Received %s. We do not use the `target` or'
    '`sample_weights` value here.' % inputs.output_shapes)

The target value is never used in the predict function, and so can be anything. Looking at the rest of the function next_element[1] is never used.

[TLDR] Using your current version, add a dummy target value to the data, or update your Keras.

like image 145
lmartens Avatar answered Sep 24 '22 23:09

lmartens


The following code worked for me (tested on tensorflow 1.10.0):

[TLDR] Only insert empty dictionary as a dummy input and specify the number of steps:

model.predict(x={},steps=4)

Full code:

import numpy as np
import tensorflow as tf
from tensorflow.data import Dataset
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.models import Model


# dummy data:
x = np.arange(4).reshape(-1, 1).astype('float32')
y = np.arange(5, 9).reshape(-1, 1).astype('float32')

# build the Datasets
ds_x = Dataset.from_tensor_slices(x).repeat().batch(4)
it_x = ds_x.make_one_shot_iterator()

ds_y = Dataset.from_tensor_slices(y).repeat().batch(4)
it_y = ds_y.make_one_shot_iterator()


# build compile and train the model
input_vals = Input(tensor=it_x.get_next())
output = Dense(1, activation='relu')(input_vals)
model = Model(inputs=input_vals, outputs=output)
model.compile('rmsprop', 'mse', target_tensors=[it_y.get_next()])
model.fit(steps_per_epoch=1, epochs=5, verbose=2)

# infer using the dataset
model.predict(x={},steps=4)

like image 45
ot226 Avatar answered Sep 22 '22 23:09

ot226