One can reduce dimensionality by using truncated SVD. It performs linear dimensionality reduction by means of truncated singular value decomposition (SVD). However, one has to choose the number of components before decomposing.
n_comp = 25
tfidf_vec = TfidfVectorizer(analyzer="word", max_features=5000, ngram_range=(1,2))
svd = TruncatedSVD(n_components=n_comp, algorithm='arpack')
tfidf_df = tfidf_vec.fit_transform(values)
df = svd.fit_transform(tfidf_df)
How to choose the number of components ?
var_explained = svd.explained_variance_ratio_.sum()
the line above will help you decide if 25 components captures the variability in your data well enough.
Sometimes, var_explained >= 0.9
or var_explained >= 0.95
reduces how many variables you need going forward in your analysis.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With