Use topic modeling information from LDA as features to perform text classification through SVM

Question

I want to perform text classification using topic modeling information as features that are fed to an svm classifier. So I was wondering how is it possible to generate topic modeling features by performing LDA on both the training and test partitions of the dataset since the corprus changes for the two partitions of the dataset?

Am I making a wrong assumption?

Could you provide an example on how to do it by using scikit learn?

Am I making a wrong assumption?

Could you provide an example on how to do it by using scikit learn?

Ash · Accepted Answer

Your assumption is right. What you do is that you train your LDA on your training data and then transform both training and testing data based on that trained model.

So you'll have something like this:

from sklearn.decomposition import LatentDirichletAllocation as LDA
lda = LDA(n_topics=10,...)
lda.fit(training_data)
training_features = lda.transform(training_data)
testing_features = lda.transform(testing_data)

If I were you, I would concatenate the LDA features with Bag of words features using numpy.hstack or scipy.hstack if your bow features are sparse.

Use topic modeling information from LDA as features to perform text classification through SVM

Tags:

python

classification

svm

lda

asterix

1 Answers

Ash

Recent Activity

Donate For Us

Use topic modeling information from LDA as features to perform text classification through SVM

Tags:

python

classification

svm

lda

asterix

1 Answers

Ash

Related questions

Recent Activity

Donate For Us