I have vector of integers representing each character in the domain name and another vector of integers representing the timeline information. I need to give both these vectors as input to a CNN model to classify domain names as good or spam.
For instance,
Vector representing domain name -> 1 x 75 vector. Each element in the vector represents each character in the domain name. If there are 1000 domain names, then it will be a matrix of shape 1000 x 75
Vector representing timeline information -> 1 x 1440 vector. Each element representing number of mails sent from a particular domain for each minute. If there are 1000 domain names, then it will be a matrix of shape 1000 x 1440
How do I input these two vectors to a single CNN model?
My current model is given only the domain name as input,
def build_model(max_features, maxlen):
"""Build CNN model"""
model = Sequential()
model.add(Embedding(max_features, 8, input_length=maxlen))
model.add(Convolution1D(6, 4, border_mode='same'))
model.add(Convolution1D(4, 4, border_mode='same'))
model.add(Convolution1D(2, 4, border_mode='same'))
model.add(Flatten())
#model.add(Dropout(0.2))
#model.add(Dense(2,activation='sigmoid'))
#model.add(Dense(180,activation='sigmoid'))
#model.add(Dropout(0.2))
model.add(Dense(2,activation='softmax'))
sgd = optimizers.SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['categorical_accuracy', 'f1score', 'precision', 'recall'])
Thanks!
In convolutions, you need a "length" dimension and a "channels" dimension.
(In 2D, they would be "width", "height" and "channels").
Now, I can't think of any way to relate the 75 characters with the 1440 minutes. (Maybe you can, and if you can state how, maybe we can work better)
Here is what I'm assuming:
So, we'd have two inputs:
from keras.layers import *
input1 = Input((75,))
input2 = Input((1440,))
Only the domain name should pass through an embedding layer:
name = Embedding(max_features, 8, input_length=maxlen)(input1)
Now, reshaping to fit the convolutional inputs (None,length,channels)
.
# the embedding output is already (Batch, 75, 8) -- See: https://keras.io/layers/embeddings/
mails = Reshape((1440,1))(input2) #adding 1 channel at the end
Parallel convolutions:
name = Conv1D( feel free to customize )(name)
name = Conv1D( feel free to customize )(name)
mails = Conv1D( feel free to customize )(mails)
mails = Conv1D( feel free to customize )(mails)
Concatenate - Since they have totally different shapes, maybe we should simply flatten both (or you could think of fancy operations to match them)
name = Flatten()(name)
mails = Flatten()(mails)
out = Concatenate()([name,mails])
out = add your extra layers
out = Dense(2,activation='softmax')(out)
And finally we create the model:
from keras.models import Model
model = Model([input1,input2], out)
Train it like this:
model.fit([xName,xMails], Y, ....)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With