Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deep Learning how to split 5 dimensions timeseries and pass some dimensions through embedding layer

I have an input that is a time series of 5 dimensions:

a = [[8,3],[2] , [4,5],[1], [9,1],[2]...] #total 100 timestamps. For each element, dims 0,1 are numerical data and dim 2 is a numerical encoding of a category. This is per sample, 3200 samples

The category has 3 possible values (0,1,2)

I want to build a NN such that the last dimension (the category) will go through an embedding layer with output size 8, and then will be concatenated back to the first two dims (the numerical data).

So, this will be something like:

input1 = keras.layers.Input(shape=(2,)) #the numerical features
input2 = keras.layers.Input(shape=(1,)) #the encoding of the categories. this part will be embedded to 5 dims
x2 = Embedding(input_dim=1, output_dim = 8)(input2) #apply it to every timestamp and take only dim 3, so [2],[1], [2] 
x = concatenate([input1,x2]) #will get 10 dims at each timepoint, still 100 timepoints
x = LSTM(units=24)(x) #the input has 10 dims/features at each timepoint, total 100 timepoints per sample
x = Dense(1, activation='sigmoid')(x)
model = Model(inputs=[input1, input2] , outputs=[x]) #input1 is 1D vec of the width 2 , input2 is 1D vec with the width 1 and it is going through the embedding
model.compile(
        loss='binary_crossentropy',
        optimizer='adam',
        metrics=['acc']
    )

How can I do it? (preferably in keras)? My problem is how to apply the embedding to every time point? Meaning, if I have 1000 timepoints with 3 dims each, I need to convert it to 1000 timepoints with 8 dims each (The emebedding layer should transform input2 from (1000X1) to (1000X8)

like image 479
okuoub Avatar asked Jan 25 '23 04:01

okuoub


1 Answers

There are a couple of issues you are having here. First let me give you a working example and explain along the way how to solve your issues.

Imports and Data Generation

import tensorflow as tf
import numpy as np

from tensorflow.keras import layers
from tensorflow.keras.models import Model

num_timesteps = 100
max_features_values = [100, 100, 3]
num_observations = 2

input_list = [[[np.random.randint(0, v) for _ in range(num_timesteps)]
   for v in max_features_values]
    for _ in range(num_observations)]

input_arr = np.array(input_list)  # shape (2, 3, 100)

In order to use an embedding we need to the voc_size as input_dimension, as stated in the LSTM documentation.

Embedding and Concatenation

voc_size = len(np.unique(input_arr[:, 2, :])) + 1  # 4

Now we need to create the inputs. Inputs should be of size [None, 2, num_timesteps] and [None, 1, num_timesteps] where the first dimension is the flexible and will be filled with the number of observations we are passing in. Let's use the embedding right after that using the previously calculated voc_size.

inp1 = layers.Input(shape=(2, num_timesteps))  # TensorShape([None, 2, 100])
inp2 = layers.Input(shape=(1, num_timesteps))  # TensorShape([None, 1, 100])
x2 = layers.Embedding(input_dim=voc_size, output_dim=8)(inp2)  # TensorShape([None, 1, 100, 8])
x2_reshaped = tf.transpose(tf.squeeze(x2, axis=1), [0, 2, 1])  # TensorShape([None, 8, 100])

This cannot be easily concatenated since all dimensions must match except for the one along the concatenation axis. But the shapes are not matching unfortunately. Therefore we reshape x2. We do so by removing the first dimension and then transposing.

Now we can concatenate without any issue and everything works in a straight forward fashion:

x = layers.concatenate([inp1, x2_reshaped], axis=1)
x = layers.LSTM(32)(x)
x = layers.Dense(1, activation='sigmoid')(x)
model = Model(inputs=[inp1, inp2], outputs=[x])

Check on Dummy Example

inp1_np = input_arr[:, :2, :]
inp2_np = input_arr[:, 2:, :]
model.predict([inp1_np, inp2_np])

# Output
# array([[0.544262 ],
#       [0.6157502]], dtype=float32)

#This outputs values between 0 and 1 just as expected.

like image 54
pythonic833 Avatar answered Jan 26 '23 23:01

pythonic833