Keras Multitask learning with two different input sample size

Question

I am implementing multitask regression model using code from the Keras API under the shared layers section.

There are two data sets, Let's call them data_1 and data_2 as follows.

data_1 : shape(1434, 185, 37)
data_2 : shape(283, 185, 37)

data_1 is consists of 1434 samples, each sample is 185 characters long and 37 shows total number of unique characters is 37 or in another words the vocab_size. Comparatively data_2 consists of 283 characters.

I convert the data_1 and data_2 into two dimensional numpy array as follows before giving it to the Embedding layer.

data_1=np.argmax(data_1, axis=2)
data_2=np.argmax(data_2, axis=2)

That makes the shape of the data as follows.

print(np.shape(data_1)) 
(1434, 185)

print(np.shape(data_2)) 
(283, 185)

Each number in the matrix represents index integer.

The multitask model is as under.

user_input = keras.layers.Input(shape=((185, )), name='Input_1')
products_input =  keras.layers.Input(shape=((185, )), name='Input_2')

shared_embed=(keras.layers.Embedding(vocab_size, 50, input_length=185))

user_vec_1 = shared_embed(user_input )
user_vec_2 = shared_embed(products_input )


input_vecs = keras.layers.concatenate([user_vec_1, user_vec_2], name='concat')

input_vecs_1=keras.layers.Flatten()(input_vecs)
input_vecs_2=keras.layers.Flatten()(input_vecs)

# Task 1 FC layers
nn = keras.layers.Dense(90, activation='relu',name='layer_1')(input_vecs_1)
result_a = keras.layers.Dense(1, activation='linear', name='output_1')(nn)



# Task 2 FC layers
nn1 = keras.layers.Dense(90, activation='relu', name='layer_2')(input_vecs_2)
result_b = keras.layers.Dense(1, activation='linear',name='output_2')(nn1) 

model = Model(inputs=[user_input , products_input], outputs=[result_a, result_b])

model.compile(optimizer='rmsprop',
              loss='mse',
              metrics=['accuracy'])

The model is visualized as follows. enter image description here

Then I fit the model as follows.

model.fit([data_1, data_2], [Y_1,Y_2], epochs=10)

Error:

ValueError: All input arrays (x) should have the same number of samples. Got array shapes: [(1434, 185), (283, 185)]

Is there any way in Keras where I can use two different sample size inputs or to some trick to avoid this error to achieve my goal of multitasking regression.

Here is the minimum working code for testing.

data_1=np.array([[25,  5, 11, 24,  6],
       [25,  5, 11, 24,  6],
       [25,  0, 11, 24,  6],
       [25, 11, 28, 11, 24],
       [25, 11,  6, 11, 11]])

data_2=np.array([[25, 11, 31,  6, 11],
       [25, 11, 28, 11, 31],
       [25, 11, 11, 11, 31]])

Y_1=np.array([[2.33],
       [2.59],
       [2.59],
       [2.54],
       [4.06]])


Y_2=np.array([[2.9],
       [2.54],
       [4.06]])



user_input = keras.layers.Input(shape=((5, )), name='Input_1')
products_input =  keras.layers.Input(shape=((5, )), name='Input_2')

shared_embed=(keras.layers.Embedding(37, 3, input_length=5))
user_vec_1 = shared_embed(user_input )
user_vec_2 = shared_embed(products_input )

input_vecs = keras.layers.concatenate([user_vec_1, user_vec_2], name='concat')


input_vecs_1=keras.layers.Flatten()(input_vecs) 
input_vecs_2=keras.layers.Flatten()(input_vecs)

    nn = keras.layers.Dense(90, activation='relu',name='layer_1')(input_vecs_1)
    result_a = keras.layers.Dense(1, activation='linear', name='output_1')(nn)

    # Task 2 FC layers
    nn1 = keras.layers.Dense(90, activation='relu', name='layer_2')(input_vecs_2)

    result_b = keras.layers.Dense(1, activation='linear',name='output_2')(nn1)

model = Model(inputs=[user_input , products_input], outputs=[result_a, result_b])

model.compile(optimizer='rmsprop',
              loss='mse',
              metrics=['accuracy'])

model.fit([data_1, data_2], [Y_1,Y_2], epochs=10)

moh · Accepted Answer

NEW ANSWER:

Here I am writing a solution with TensorFlow 2. So, what you need is:

to define a dynamic input that takes its shape from the data
to use average pooling so your dens layer dimension is independent of input dimensions.
to calculate losses separately

Here is your example modified to work:

## Do this
#pip install tensorflow==2.0.0

import tensorflow.keras as keras
import numpy as np
from tensorflow.keras.models import Model


data_1=np.array([[25,  5, 11, 24,  6],
       [25,  5, 11, 24,  6],
       [25,  0, 11, 24,  6],
       [25, 11, 28, 11, 24],
       [25, 11,  6, 11, 11]])

data_2=np.array([[25, 11, 31,  6, 11],
       [25, 11, 28, 11, 31],
       [25, 11, 11, 11, 31]])

Y_1=np.array([[2.33],
       [2.59],
       [2.59],
       [2.54],
       [4.06]])


Y_2=np.array([[2.9],
       [2.54],
       [4.06]])



user_input = keras.layers.Input(shape=((None,)), name='Input_1')
products_input =  keras.layers.Input(shape=((None,)), name='Input_2')

shared_embed=(keras.layers.Embedding(37, 3, input_length=5))
user_vec_1 = shared_embed(user_input )
user_vec_2 = shared_embed(products_input )

x = keras.layers.GlobalAveragePooling1D()(user_vec_1)
nn = keras.layers.Dense(90, activation='relu',name='layer_1')(x)
result_a = keras.layers.Dense(1, activation='linear', name='output_1')(nn)

# Task 2 FC layers
x = keras.layers.GlobalAveragePooling1D()(user_vec_2)
nn1 = keras.layers.Dense(90, activation='relu', name='layer_2')(x)

result_b = keras.layers.Dense(1, activation='linear',name='output_2')(nn1)

model = Model(inputs=[user_input , products_input], outputs=[result_a, result_b])


loss = tf.keras.losses.MeanSquaredError()
optimizer = tf.keras.optimizers.Adam()

loss_values = []
num_iter = 300
for i in range(num_iter):
    with tf.GradientTape() as tape:
        # Forward pass.
        logits = model([data_1, data_2])          
        loss_value = loss(Y_1, logits[0]) + loss(Y_2, logits[1]) 
        loss_values.append(loss_value)
    gradients = tape.gradient(loss_value, model.trainable_weights)          
    optimizer.apply_gradients(zip(gradients, model.trainable_weights))

import matplotlib.pyplot as plt
plt.plot(range(num_iter), loss_values)
plt.xlabel("iterations")
plt.ylabel('loss value')

enter image description here

OLD ANSWER:

It seems your problem is not a coding problem, it's a machine learning problem! You have to pair your datasets: It means, you have to feed your Keras model on both of its input layers at each round.

The solution is up-sampling your smaller dataset in a way that size of both datasets are same. And the way that you do it depends on the semantics of your datasets. The other option is downsampling your bigger dataset, which is not recommended.

In a very basic situation, if we assume samples are i.i.d. across datasets, you can use the following code:

random_indices = np.random.choice(data_2.shape[0],
data_1.shape[0], replace=True) 

upsampled_data_2 = data_2[random_indices]

So, you get a new version of your smaller dataset, upsampled_data_2, that contains some repeated samples, but with the same size to your bigger dataset.

mdaoust · Answer

It's not clear in your question if you're trying to:

Build a single model that takes a user and a product, and predicts two things about that (user, product) pair. If the user and product aren't paired, then it's not clear that this means anything (as @matias-valdenegro pointed out). If you pair up a random element of the other type (as in the first answer).. hopefully each output will just learn to ignore the other input. This would be equivalent to:
Build two models, that share an embedding layer (in which case the concat doesn't make any sense). If Y1 has the same length as data1 and Y2 has the same shape as data2 then this is probably what you want. This way if you have a user you can run the user model, and if you have a product you can run the product model.

I think you really want #2. To train it you can do something like:

for user_batch, product_batch in zip(user_data.shuffle().repeat(), 
                                     product_data.shuffle().repeat()): 
  user_model.train_on_batch(*user_batch)
  product_model.train_on_batch(*product_batch)
  step = 1
  if step > STEPS:
    break

Or, wrap them both in a combined model:

user_result = user_model(user_input)
product_result = product_model(product_input)

model = Model(inputs=[user_input , products_input], 
              outputs=[user_result, product_result])

model.compile(optimizer='rmsprop',
              loss='mse',
              metrics=['accuracy'])

model.fit([data_1, data_2], [Y_1,Y_2], epochs=10)

Regardless of which training procedure you use, you should normalized the output ranges so that the two model's losses are comparable. The first procedure will alternate epochs or steps. The second does a single gradient step on the weighted sum of the two losses. You may want to check which loss weighting works best for you.

Daniel Möller · Answer

For this answer, I will suppose you "don't have any rule for pairing users and products", and that you simply want to sample randomly.

Although there is already an answer for this, that answer is biased because it will have fixed pairs and you will want to vary pairs. For this you will want to have a generator:

def generator(data_1, data_2, y1, y2, batchSize):

    l1 = len(data_1)
    l2 = len(data_2)

    #extending data 2 to the same size of data 1
    sampleRelation = l1 // l2 #l1 must be bigger
    if l1 % l2 > 0:
        sampleRelation += 1

    data_2 = np.concatenate([data_2] * sampleRelation, axis=0)
    y2 = np.concatenate([y2] * sampleRelation, axis=0)
    data_2 = data_2[:l1]
    y2 = y2[:l1]


    #batches per epoch = steps_per_epoch
    batches = l1 // batchSize
    if l1 % batchSize > 0:
       batches += 1

    while True:  #keras generators must be infinite

        #shuffle indices in every epoch
        np.random.shuffle(data_1)
        np.random.shuffle(data_2)

        #batch loop
        for b in range(batches):
            start = b*batchSize
            end = start + batchSize

            yield [data_1[start:end], data_2[start:end]], [y1[start:end], y2[start:end]]

Train your model with model.fit_generator(generator(data_1, data_2, Y_1, Y_2, batchSize), steps_per_epoch = same_number_of_batches_as_inside_generator, ...)

If my guess is correct, you don't want any relation between user and product, and simply want to train two models simultaneously. In this case, I suggest this architecture:

Architecture picture

Keras Multitask learning with two different input sample size

Tags:

python

machine-learning

tensorflow

nlp

keras

Abdul Karim Khan

3 Answers

moh

mdaoust

Daniel Möller

Recent Activity

Donate For Us

Keras Multitask learning with two different input sample size

Tags:

python

machine-learning

tensorflow

nlp

keras

Abdul Karim Khan

3 Answers

moh

mdaoust

Daniel Möller

Related questions

Recent Activity

Donate For Us