Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Text classification using Keras: How to add custom features?

I'm writing a program to classify texts into a few classes. Right now, the program loads the train and test samples of word indices, applies an embedding layer and a convolutional layer, and classifies them into the classes. I'm trying to add handcrafted features for experimentation, as in the following code. The features is a list of two elements, where the first element consists of features for the training data, and the second consists of features for the test data. Each training/test sample will have a corresponding feature vector (i.e. the features are not word features).

model = Sequential()
model.add(Embedding(params.nb_words,
                    params.embedding_dims,
                    weights=[embedding_matrix],
                    input_length=params.maxlen,
                    trainable=params.trainable))
model.add(Convolution1D(nb_filter=params.nb_filter,
                        filter_length=params.filter_length,
                        border_mode='valid',
                        activation='relu'))
model.add(Dropout(params.dropout_rate))
model.add(GlobalMaxPooling1D())

# Adding hand-picked features
model_features = Sequential()
nb_features = len(features[0][0])

model_features.add(Dense(1,
                         input_shape=(nb_features,),
                         init='uniform',
                         activation='relu'))

model_final = Sequential()
model_final.add(Merge([model, model_features], mode='concat'))

model_final.add(Dense(len(citfunc.funcs), activation='softmax'))
model_final.compile(loss='categorical_crossentropy',
                    optimizer='adam',
                    metrics=['accuracy'])

print model_final.summary()
model_final.fit([x_train, features[0]], y_train,
                nb_epoch=params.nb_epoch,
                batch_size=params.batch_size,
                class_weight=data.get_class_weights(x_train, y_train))

y_pred = model_final.predict([x_test, features[1]])

My question is, is this code correct? Is there any conventional way of adding features to each of the text sequences?

like image 411
hsiaomijiou Avatar asked Mar 27 '17 07:03

hsiaomijiou


People also ask

What features you would use to classify a text?

Some of the most well-known examples of text classification include sentiment analysis, topic labeling, language detection, and intent detection.

How does BiLSTM work for text classification?

The text classification BiLSTM is as shown in Fig. 2. Firstly, the texts are mapped to vectors in the embedding layer, and then, features of vectors are extracted in the two-way LSTM layer to generate the last sequence. Finally, the last sequence will be classified in the fully connected layer with a softmax function.


1 Answers

Try:

input = Input(shape=(params.maxlen,))
embedding = Embedding(params.nb_words,
                    params.embedding_dims,
                    weights=[embedding_matrix],
                    input_length=params.maxlen,
                    trainable=params.trainable)(input)
conv = Convolution1D(nb_filter=params.nb_filter,
                        filter_length=params.filter_length,
                        border_mode='valid',
                        activation='relu')(embedding)
drop = Dropout(params.dropout_rate)(conv)
seq_features = GlobalMaxPooling1D()(drop)

# Adding hand-picked features
nb_features = len(features[0][0])
other_features = Input(shape=(nb_features,))

model_final = merge([seq_features , other_features], mode='concat'))

model_final = Dense(len(citfunc.funcs), activation='softmax'))(model_final)

model_final = Model([input, other_features], model_final)

model_final.compile(loss='categorical_crossentropy',
                    optimizer='adam',
                    metrics=['accuracy'])

In this case - you are merging features from a sequence analysis with custom features directly - without squashing all custom features to 1 features using Dense.

like image 170
Marcin Możejko Avatar answered Sep 28 '22 08:09

Marcin Możejko