Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python - how to append numpy array to a pandas dataframe

I have trained a Logistic Regression classifier to predict whether a review is positive or negative. Now, I want to append the predicted probabilities returned by the predict_proba-function to my Pandas data frame containing the reviews. I tried doing something like:

test_data['prediction'] = sentiment_model.predict_proba(test_matrix)

Obviously, that doesn't work, since predict_proba returns a 2D-numpy array. So, what is the most efficient way of doing this? I created test_matrix with SciKit-Learn's CountVectorizer:

vectorizer = CountVectorizer(token_pattern=r'\b\w+\b')
train_matrix = vectorizer.fit_transform(train_data['review_clean'].values.astype('U'))
test_matrix = vectorizer.transform(test_data['review_clean'].values.astype('U'))

Sample data looks like:

| Review                                     | Prediction         |                      
| ------------------------------------------ | ------------------ |
| "Toy was great! Our six-year old loved it!"|   0.986            |
like image 916
DBE7 Avatar asked Feb 18 '17 11:02

DBE7


People also ask

How do I add a NumPy array to a DataFrame?

To convert an array to a dataframe with Python you need to 1) have your NumPy array (e.g., np_array), and 2) use the pd. DataFrame() constructor like this: df = pd. DataFrame(np_array, columns=['Column1', 'Column2']) . Remember, that each column in your NumPy array needs to be named with columns.

How do I append to a Pandas DataFrame?

Series append syntax The syntax for using append on a Series is very similar to the dataframe syntax. You type the name of the first Series, and then . append() to call the method. Then inside the parenthesis, you type the name of the second Series, which you want to append to the end of the first.

Can you append NumPy array?

Append values to the end of an array. Values are appended to a copy of this array. These values are appended to a copy of arr.


1 Answers

Assign the predictions to a variable and then extract the columns from the variable to be assigned to the pandas dataframe cols. If x is the 2D numpy array with predictions,

x = sentiment_model.predict_proba(test_matrix)

then you can do,

test_data['prediction0'] = x[:,0]
test_data['prediction1'] = x[:,1]
like image 153
Karthik Arumugham Avatar answered Oct 02 '22 19:10

Karthik Arumugham