I am implementing multitask regression model using code from the Keras API under the shared layers section.
There are two data sets, Let's call them data_1
and data_2
as follows.
data_1 : shape(1434, 185, 37)
data_2 : shape(283, 185, 37)
data_1
is consists of 1434 samples, each sample is 185 characters long and 37 shows total number of unique characters is 37 or in another words the vocab_size
. Comparatively data_2
consists of 283 characters.
I convert the data_1
and data_2
into two dimensional numpy array as follows before giving it to the Embedding layer.
data_1=np.argmax(data_1, axis=2)
data_2=np.argmax(data_2, axis=2)
That makes the shape of the data as follows.
print(np.shape(data_1))
(1434, 185)
print(np.shape(data_2))
(283, 185)
Each number in the matrix represents index integer.
The multitask model is as under.
user_input = keras.layers.Input(shape=((185, )), name='Input_1')
products_input = keras.layers.Input(shape=((185, )), name='Input_2')
shared_embed=(keras.layers.Embedding(vocab_size, 50, input_length=185))
user_vec_1 = shared_embed(user_input )
user_vec_2 = shared_embed(products_input )
input_vecs = keras.layers.concatenate([user_vec_1, user_vec_2], name='concat')
input_vecs_1=keras.layers.Flatten()(input_vecs)
input_vecs_2=keras.layers.Flatten()(input_vecs)
# Task 1 FC layers
nn = keras.layers.Dense(90, activation='relu',name='layer_1')(input_vecs_1)
result_a = keras.layers.Dense(1, activation='linear', name='output_1')(nn)
# Task 2 FC layers
nn1 = keras.layers.Dense(90, activation='relu', name='layer_2')(input_vecs_2)
result_b = keras.layers.Dense(1, activation='linear',name='output_2')(nn1)
model = Model(inputs=[user_input , products_input], outputs=[result_a, result_b])
model.compile(optimizer='rmsprop',
loss='mse',
metrics=['accuracy'])
The model is visualized as follows.
Then I fit the model as follows.
model.fit([data_1, data_2], [Y_1,Y_2], epochs=10)
Error:
ValueError: All input arrays (x) should have the same number of samples. Got array shapes: [(1434, 185), (283, 185)]
Is there any way in Keras where I can use two different sample size inputs or to some trick to avoid this error to achieve my goal of multitasking regression.
Here is the minimum working code for testing.
data_1=np.array([[25, 5, 11, 24, 6],
[25, 5, 11, 24, 6],
[25, 0, 11, 24, 6],
[25, 11, 28, 11, 24],
[25, 11, 6, 11, 11]])
data_2=np.array([[25, 11, 31, 6, 11],
[25, 11, 28, 11, 31],
[25, 11, 11, 11, 31]])
Y_1=np.array([[2.33],
[2.59],
[2.59],
[2.54],
[4.06]])
Y_2=np.array([[2.9],
[2.54],
[4.06]])
user_input = keras.layers.Input(shape=((5, )), name='Input_1')
products_input = keras.layers.Input(shape=((5, )), name='Input_2')
shared_embed=(keras.layers.Embedding(37, 3, input_length=5))
user_vec_1 = shared_embed(user_input )
user_vec_2 = shared_embed(products_input )
input_vecs = keras.layers.concatenate([user_vec_1, user_vec_2], name='concat')
input_vecs_1=keras.layers.Flatten()(input_vecs)
input_vecs_2=keras.layers.Flatten()(input_vecs)
nn = keras.layers.Dense(90, activation='relu',name='layer_1')(input_vecs_1)
result_a = keras.layers.Dense(1, activation='linear', name='output_1')(nn)
# Task 2 FC layers
nn1 = keras.layers.Dense(90, activation='relu', name='layer_2')(input_vecs_2)
result_b = keras.layers.Dense(1, activation='linear',name='output_2')(nn1)
model = Model(inputs=[user_input , products_input], outputs=[result_a, result_b])
model.compile(optimizer='rmsprop',
loss='mse',
metrics=['accuracy'])
model.fit([data_1, data_2], [Y_1,Y_2], epochs=10)
NEW ANSWER:
Here I am writing a solution with TensorFlow 2. So, what you need is:
to define a dynamic input that takes its shape from the data
to use average pooling so your dens layer dimension is independent of input dimensions.
to calculate losses separately
Here is your example modified to work:
## Do this
#pip install tensorflow==2.0.0
import tensorflow.keras as keras
import numpy as np
from tensorflow.keras.models import Model
data_1=np.array([[25, 5, 11, 24, 6],
[25, 5, 11, 24, 6],
[25, 0, 11, 24, 6],
[25, 11, 28, 11, 24],
[25, 11, 6, 11, 11]])
data_2=np.array([[25, 11, 31, 6, 11],
[25, 11, 28, 11, 31],
[25, 11, 11, 11, 31]])
Y_1=np.array([[2.33],
[2.59],
[2.59],
[2.54],
[4.06]])
Y_2=np.array([[2.9],
[2.54],
[4.06]])
user_input = keras.layers.Input(shape=((None,)), name='Input_1')
products_input = keras.layers.Input(shape=((None,)), name='Input_2')
shared_embed=(keras.layers.Embedding(37, 3, input_length=5))
user_vec_1 = shared_embed(user_input )
user_vec_2 = shared_embed(products_input )
x = keras.layers.GlobalAveragePooling1D()(user_vec_1)
nn = keras.layers.Dense(90, activation='relu',name='layer_1')(x)
result_a = keras.layers.Dense(1, activation='linear', name='output_1')(nn)
# Task 2 FC layers
x = keras.layers.GlobalAveragePooling1D()(user_vec_2)
nn1 = keras.layers.Dense(90, activation='relu', name='layer_2')(x)
result_b = keras.layers.Dense(1, activation='linear',name='output_2')(nn1)
model = Model(inputs=[user_input , products_input], outputs=[result_a, result_b])
loss = tf.keras.losses.MeanSquaredError()
optimizer = tf.keras.optimizers.Adam()
loss_values = []
num_iter = 300
for i in range(num_iter):
with tf.GradientTape() as tape:
# Forward pass.
logits = model([data_1, data_2])
loss_value = loss(Y_1, logits[0]) + loss(Y_2, logits[1])
loss_values.append(loss_value)
gradients = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(gradients, model.trainable_weights))
import matplotlib.pyplot as plt
plt.plot(range(num_iter), loss_values)
plt.xlabel("iterations")
plt.ylabel('loss value')
OLD ANSWER:
It seems your problem is not a coding problem, it's a machine learning problem! You have to pair your datasets: It means, you have to feed your Keras model on both of its input layers at each round.
The solution is up-sampling your smaller dataset in a way that size of both datasets are same. And the way that you do it depends on the semantics of your datasets. The other option is downsampling your bigger dataset, which is not recommended.
In a very basic situation, if we assume samples are i.i.d. across datasets, you can use the following code:
random_indices = np.random.choice(data_2.shape[0],
data_1.shape[0], replace=True)
upsampled_data_2 = data_2[random_indices]
So, you get a new version of your smaller dataset, upsampled_data_2
, that contains some repeated samples, but with the same size to your bigger dataset.
It's not clear in your question if you're trying to:
Build a single model that takes a user
and a product
, and predicts two things about that (user, product)
pair. If the user
and product
aren't paired, then it's not clear that this means anything (as @matias-valdenegro pointed out). If you pair up a random element of the other type (as in the first answer).. hopefully each output will just learn to ignore the other input. This would be equivalent to:
Build two models, that share an embedding layer (in which case the concat doesn't make any sense). If Y1
has the same length as data1
and Y2
has the same shape as data2
then this is probably what you want. This way if you have a user
you can run the user
model, and if you have a product
you can run the product
model.
I think you really want #2. To train it you can do something like:
for user_batch, product_batch in zip(user_data.shuffle().repeat(),
product_data.shuffle().repeat()):
user_model.train_on_batch(*user_batch)
product_model.train_on_batch(*product_batch)
step = 1
if step > STEPS:
break
Or, wrap them both in a combined model:
user_result = user_model(user_input)
product_result = product_model(product_input)
model = Model(inputs=[user_input , products_input],
outputs=[user_result, product_result])
model.compile(optimizer='rmsprop',
loss='mse',
metrics=['accuracy'])
model.fit([data_1, data_2], [Y_1,Y_2], epochs=10)
Regardless of which training procedure you use, you should normalized the output ranges so that the two model's losses are comparable. The first procedure will alternate epochs or steps. The second does a single gradient step on the weighted sum of the two losses. You may want to check which loss weighting works best for you.
For this answer, I will suppose you "don't have any rule for pairing users and products", and that you simply want to sample randomly.
Although there is already an answer for this, that answer is biased because it will have fixed pairs and you will want to vary pairs. For this you will want to have a generator:
def generator(data_1, data_2, y1, y2, batchSize):
l1 = len(data_1)
l2 = len(data_2)
#extending data 2 to the same size of data 1
sampleRelation = l1 // l2 #l1 must be bigger
if l1 % l2 > 0:
sampleRelation += 1
data_2 = np.concatenate([data_2] * sampleRelation, axis=0)
y2 = np.concatenate([y2] * sampleRelation, axis=0)
data_2 = data_2[:l1]
y2 = y2[:l1]
#batches per epoch = steps_per_epoch
batches = l1 // batchSize
if l1 % batchSize > 0:
batches += 1
while True: #keras generators must be infinite
#shuffle indices in every epoch
np.random.shuffle(data_1)
np.random.shuffle(data_2)
#batch loop
for b in range(batches):
start = b*batchSize
end = start + batchSize
yield [data_1[start:end], data_2[start:end]], [y1[start:end], y2[start:end]]
Train your model with model.fit_generator(generator(data_1, data_2, Y_1, Y_2, batchSize), steps_per_epoch = same_number_of_batches_as_inside_generator, ...)
If my guess is correct, you don't want any relation between user and product, and simply want to train two models simultaneously. In this case, I suggest this architecture:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With