I have a bunch of 14784 text documents, which I am trying to vectorize, so I can run some analysis. I used the CountVectorizer
in sklearn, to convert the documents to feature vectors. I did this by calling:
vectorizer = CountVectorizer
features = vectorizer.fit_transform(examples)
where examples is an array of all the text documents
Now, I am trying to use additional features. For this, I am storing the features in a pandas dataframe. At present, my pandas dataframe(without inserting the text features) has the shape (14784, 5)
. The shape of my feature vector is (14784, 21343)
.
What would be a good way to insert the vectorized features into the pandas dataframe?
Word Counts with CountVectorizer You can use it as follows: Create an instance of the CountVectorizer class. Call the fit() function in order to learn a vocabulary from one or more documents. Call the transform() function on one or more documents as needed to encode each as a vector.
append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value.
You can use the assign() function to add a new column to the end of a pandas DataFrame: df = df. assign(col_name=[value1, value2, value3, ...])
Pandas DataFrame add() Method The add() method adds each value in the DataFrame with a specified value. The specified value must be an object that can be added to the values of the DataFrame.
Return term-document matrix after learning the vocab dictionary from the raw documents.
X = vect.fit_transform(docs)
Convert sparse csr matrix to dense format and allow columns to contain the array mapping from feature integer indices to feature names.
count_vect_df = pd.DataFrame(X.todense(), columns=vect.get_feature_names())
Concatenate the original df
and the count_vect_df
columnwise.
pd.concat([df, count_vect_df], axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With