What this error means: `y` argument is not supported when using python generator as input

Question

I try to develop a network, and use python generator as data provider. Everything looks OK until the model starts to fit, then I receive this error:

ValueError: `y` argument is not supported when using dataset as input.

I proofed every line and, I think the problem is in the format of x_test and y_test feed to the network. After hours of googling, and changing the format several times, the error is still there.

Can you help me to fix it? You can find the whole code below:

import os
import numpy as np
import pandas as pd
import re  # To match regular expression for extracting labels

import tensorflow as tf

print(tf.__version__)


def xfiles(filename):
    if re.match('^\w{12}_x\.csv$', filename) is None:
        return False
    else:
        return True


def data_generator():
    folder = "i:/Stockpred/csvdbase/datasets/DS0002"
    file_list = os.listdir(folder)
    x_files = list(filter(xfiles, file_list))
    x_files.sort()
    np.random.seed(1729)
    np.random.shuffle(x_files)

    for file in x_files:
        filespec = folder + '/' + file
        xs = pd.read_csv(filespec, header=None)

        yfile = file.replace('_x', '_y')
        yfilespec = folder + '/' + yfile
        ys = pd.read_csv(open(yfilespec, 'r'), header=None, usecols=[1])

        xs = np.asarray(xs, dtype=np.float32)
        ys = np.asarray(ys, dtype=np.float32)

        for i in range(xs.shape[0]):
            yield xs[i][1:169], ys[i][0]


dataset = tf.data.Dataset.from_generator(
    data_generator,
    (tf.float32, tf.float32),
    (tf.TensorShape([168, ]), tf.TensorShape([])))
dataset = dataset.shuffle(buffer_size=16000, seed=1729)
# dataset = dataset.batch(4000, drop_remainder=True)
dataset = dataset.cache('R:/Temp/model')


def is_test(i, d):
    return i % 4 == 0


def is_train(i, d):
    return not is_test(i, d)


recover = lambda i, d: d

test_dataset = dataset.enumerate().filter(is_test).map(recover)
train_dataset = dataset.enumerate().filter(is_train).map(recover)

x_test = test_dataset.map(lambda x, y: x)
y_test = test_dataset.map(lambda x, y: y)

x_train = train_dataset.map(lambda x, y: x)
y_train = train_dataset.map(lambda x, y: y)

print(x_train.element_spec)
print(y_train.element_spec)
print(x_test.element_spec)
print(y_test.element_spec)

# define an object (initializing RNN)
model = tf.keras.models.Sequential()

# first LSTM layer
model.add(tf.keras.layers.LSTM(units=168, activation='relu', return_sequences=True, input_shape=(168, 1)))
# dropout layer
model.add(tf.keras.layers.Dropout(0.2))

# second LSTM layer
model.add(tf.keras.layers.LSTM(units=168, activation='relu', return_sequences=True))
# dropout layer
model.add(tf.keras.layers.Dropout(0.2))

# third LSTM layer
model.add(tf.keras.layers.LSTM(units=80, activation='relu', return_sequences=True))
# dropout layer
model.add(tf.keras.layers.Dropout(0.2))

# fourth LSTM layer
model.add(tf.keras.layers.LSTM(units=120, activation='relu'))
# dropout layer
model.add(tf.keras.layers.Dropout(0.2))

# output layer
model.add(tf.keras.layers.Dense(units=1))

model.summary()

# compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

model.fit(x_train.as_numpy_iterator(), y_train.as_numpy_iterator(), batch_size=32, epochs=100)

predicted_stock_price = model.predict(x_test)

everything looks OK until the model starts to fit. and i reciev this error:

ValueError: `y` argument is not supported when using dataset as input.

Can you help to fix it?

tyrrr · Accepted Answer

As the docs say:

y - Target data. Like the input data x, it could be either Numpy array(s) or TensorFlow tensor(s). It should be consistent with x (you cannot have Numpy inputs and tensor targets, or inversely). If x is a dataset, generator, or keras.utils.Sequence instance, y should not be specified (since targets will be obtained from x).

So, I suppose you should have one generator serving tuples of sample and label.

Koray Kinik · Answer

If you are providing Dataset as input, then

type(train_dataset) should be tensorflow.python.data.ops.dataset_ops.BatchDataset

if so, simply feed this Dataset (which includes your X and y bundle) into the model,

model.fit(train_dataset, batch_size=32, epochs=100)

(Yes, this is a little different convention than how we did in sklearn - X and y separately.)

meanwhile, if you want tensorflow to explicitly use a separate dataset for validation, you must use the kwarg like:

model.fit(train_dataset, validation_data=val_dataset, batch_size=32, epochs=100)

where val_dataset is a separate dataset you had spared for validation during model training. (Not test).

tillmo · Answer

use model.fit_generator, and use tuples (x,y) of input data and labels. So altogether:

model.fit_generator(train_dataset.as_numpy_iterator(),epochs=100)

What this error means: `y` argument is not supported when using python generator as input

Tags:

python

tensorflow

deep-learning

keras

lstm

Dariush Eivazi

3 Answers

tyrrr

Koray Kinik

tillmo

Recent Activity

Donate For Us

What this error means: `y` argument is not supported when using python generator as input

Tags:

python

tensorflow

deep-learning

keras

lstm

Dariush Eivazi

3 Answers

tyrrr

Koray Kinik

tillmo

Related questions

Recent Activity

Donate For Us