Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Number of components Trucated SVD

One can reduce dimensionality by using truncated SVD. It performs linear dimensionality reduction by means of truncated singular value decomposition (SVD). However, one has to choose the number of components before decomposing.

n_comp = 25
tfidf_vec = TfidfVectorizer(analyzer="word", max_features=5000, ngram_range=(1,2))
svd = TruncatedSVD(n_components=n_comp, algorithm='arpack')
tfidf_df = tfidf_vec.fit_transform(values)
df = svd.fit_transform(tfidf_df)

How to choose the number of components ?

like image 569
J. Doe Avatar asked Dec 14 '22 18:12

J. Doe


1 Answers

var_explained = svd.explained_variance_ratio_.sum()

the line above will help you decide if 25 components captures the variability in your data well enough. Sometimes, var_explained >= 0.9 or var_explained >= 0.95 reduces how many variables you need going forward in your analysis.

like image 168
engAnt Avatar answered Dec 17 '22 23:12

engAnt