Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What this error means: `y` argument is not supported when using python generator as input

I try to develop a network, and use python generator as data provider. Everything looks OK until the model starts to fit, then I receive this error:

ValueError: `y` argument is not supported when using dataset as input.

I proofed every line and, I think the problem is in the format of x_test and y_test feed to the network. After hours of googling, and changing the format several times, the error is still there.

Can you help me to fix it? You can find the whole code below:

import os
import numpy as np
import pandas as pd
import re  # To match regular expression for extracting labels

import tensorflow as tf

print(tf.__version__)


def xfiles(filename):
    if re.match('^\w{12}_x\.csv$', filename) is None:
        return False
    else:
        return True


def data_generator():
    folder = "i:/Stockpred/csvdbase/datasets/DS0002"
    file_list = os.listdir(folder)
    x_files = list(filter(xfiles, file_list))
    x_files.sort()
    np.random.seed(1729)
    np.random.shuffle(x_files)

    for file in x_files:
        filespec = folder + '/' + file
        xs = pd.read_csv(filespec, header=None)

        yfile = file.replace('_x', '_y')
        yfilespec = folder + '/' + yfile
        ys = pd.read_csv(open(yfilespec, 'r'), header=None, usecols=[1])

        xs = np.asarray(xs, dtype=np.float32)
        ys = np.asarray(ys, dtype=np.float32)

        for i in range(xs.shape[0]):
            yield xs[i][1:169], ys[i][0]


dataset = tf.data.Dataset.from_generator(
    data_generator,
    (tf.float32, tf.float32),
    (tf.TensorShape([168, ]), tf.TensorShape([])))
dataset = dataset.shuffle(buffer_size=16000, seed=1729)
# dataset = dataset.batch(4000, drop_remainder=True)
dataset = dataset.cache('R:/Temp/model')


def is_test(i, d):
    return i % 4 == 0


def is_train(i, d):
    return not is_test(i, d)


recover = lambda i, d: d

test_dataset = dataset.enumerate().filter(is_test).map(recover)
train_dataset = dataset.enumerate().filter(is_train).map(recover)

x_test = test_dataset.map(lambda x, y: x)
y_test = test_dataset.map(lambda x, y: y)

x_train = train_dataset.map(lambda x, y: x)
y_train = train_dataset.map(lambda x, y: y)

print(x_train.element_spec)
print(y_train.element_spec)
print(x_test.element_spec)
print(y_test.element_spec)

# define an object (initializing RNN)
model = tf.keras.models.Sequential()

# first LSTM layer
model.add(tf.keras.layers.LSTM(units=168, activation='relu', return_sequences=True, input_shape=(168, 1)))
# dropout layer
model.add(tf.keras.layers.Dropout(0.2))

# second LSTM layer
model.add(tf.keras.layers.LSTM(units=168, activation='relu', return_sequences=True))
# dropout layer
model.add(tf.keras.layers.Dropout(0.2))

# third LSTM layer
model.add(tf.keras.layers.LSTM(units=80, activation='relu', return_sequences=True))
# dropout layer
model.add(tf.keras.layers.Dropout(0.2))

# fourth LSTM layer
model.add(tf.keras.layers.LSTM(units=120, activation='relu'))
# dropout layer
model.add(tf.keras.layers.Dropout(0.2))

# output layer
model.add(tf.keras.layers.Dense(units=1))

model.summary()

# compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

model.fit(x_train.as_numpy_iterator(), y_train.as_numpy_iterator(), batch_size=32, epochs=100)

predicted_stock_price = model.predict(x_test)

everything looks OK until the model starts to fit. and i reciev this error:

ValueError: `y` argument is not supported when using dataset as input.

Can you help to fix it?

like image 726
Dariush Eivazi Avatar asked Jun 27 '20 12:06

Dariush Eivazi


3 Answers

As the docs say:

y - Target data. Like the input data x, it could be either Numpy array(s) or TensorFlow tensor(s). It should be consistent with x (you cannot have Numpy inputs and tensor targets, or inversely). If x is a dataset, generator, or keras.utils.Sequence instance, y should not be specified (since targets will be obtained from x).

So, I suppose you should have one generator serving tuples of sample and label.

like image 106
tyrrr Avatar answered Nov 17 '22 20:11

tyrrr


If you are providing Dataset as input, then

type(train_dataset) should be tensorflow.python.data.ops.dataset_ops.BatchDataset

if so, simply feed this Dataset (which includes your X and y bundle) into the model,

model.fit(train_dataset, batch_size=32, epochs=100)

(Yes, this is a little different convention than how we did in sklearn - X and y separately.)

meanwhile, if you want tensorflow to explicitly use a separate dataset for validation, you must use the kwarg like:

model.fit(train_dataset, validation_data=val_dataset, batch_size=32, epochs=100)

where val_dataset is a separate dataset you had spared for validation during model training. (Not test).

like image 31
Koray Kinik Avatar answered Nov 17 '22 21:11

Koray Kinik


use model.fit_generator, and use tuples (x,y) of input data and labels. So altogether:

model.fit_generator(train_dataset.as_numpy_iterator(),epochs=100)
like image 1
tillmo Avatar answered Nov 17 '22 20:11

tillmo