Add RandomForestClassifier Predict_Proba Results to Original Dataframe

Tags:

I'm a newbie working on my first 'real' ML algorithm. Apologies if this is duplicated but I can't find the answer on SO.

I've got the following dataframe (df):

index    Feature1  Feature2  Feature3  Target
001       01         01        03        0
002       03         03        01        1
003       03         02        02        1

My code looks something like this:

data = df[['Feature1', 'Feature2', 'Feature3']]
labels = df['Target']
X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size = 0.8)

clf = RandomForestClassifier().fit(X_train, y_train)

prediction_of_probability = clf.predict_proba(X_test)

What I'm struggling with is how can I get the 'prediction_of_probability' back into the dataframe df?

I understand the predictions would not be for all items in the original dataframe.

Thank you in advance for helping a newbie like me!

238

asked Feb 23 '18 11:02

Python_Learner_DK

3 Answers

What you did is training the model. It means that with the features and the label you have, you train the model for future data. To test the quality of the model(selection of features for example), the model is tested on the X_test and y_test. In this case, you dont have future data, so you are not applying your model, you are just training it. You can see the quality of your model with AUC or ROC curves.

Anyway you can append the results to the dataframe in this way:

df_test = pd.DataFrame(X_test)
df_test['Target'] = y_test
df_test['prob_0'] = prediction_of_probability[:,0] 
df_test['prob_1'] = prediction_of_probability[:,1]

189

answered Oct 18 '22 14:10

Joe

You can try to keep the indices of the train and test and then put it all together this way:

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

data = df[['Feature1', 'Feature2', 'Feature3']]
labels = df['Target']
indices = df.index.values 

# use the indices instead the labels to save the order of the split.

X_train, X_test,indices_train,indices_test = train_test_split(data,indices, test_size=0.33, random_state=42)

y_train, y_test = labels[indices_train],  labels[indices_test]


clf = RandomForestClassifier().fit(X_train, y_train)

prediction_of_probability = clf.predict_proba(X_test)

Then you can put the probabilities in the new df_new:

>>> df_new = df.copy()
>>> df_new.loc[indices_test,'pred_test'] = prediction_of_probability # clf.predict_proba(X_test)
>>> print(df_new)

   Feature1  Feature2  Feature3  Target  pred_test
1         3         3         1       1        NaN
2         3         2         2       1        NaN
0         1         1         3       0        1.0

And even the predictions for the train:

>>> df_new.loc[indices_train,'pred_train'] = clf.predict_proba(X_train)
>>> print(df_new)

   Feature1  Feature2  Feature3  Target  pred_test  pred_train
1         3         3         1       1        NaN         1.0
2         3         2         2       1        NaN         1.0
0         1         1         3       0        1.0         NaN

Or if you want to mix the probabilities of train and test, just use the same column name (i.e. pred).

answered Oct 18 '22 15:10

Mabel Villalba

You need something like this:

# Create new dataframe to store test data.
df1 = pd.DataFrame(X_test)
df1['Target'] = y_test
df1['prob'] = prediction_of_probability[:,0]  

# Create another dataframe to store train data
df2 = pd.DataFrame(X_train)
df2['Target'] = y_train

# Append both dataframes
df = df1.append(df2).sort_index()

answered Oct 18 '22 14:10

Sociopath

Related questions
                            
                                Pandas - Convert columns to new rows after groupby
                            
                                parent-child relationship query in simple_salesforce python, extracting from ordered dicts
                            
                                method object is not JSON serializable
                            
                                Python __dict__
                            
                                Installation of PyCairo on Windows
                            
                                Removing leading zeros from pandas.core.series.Series
                            
                                I want to know the sample bucket name in boto3
                            
                                Headless chrome with selenium, can only find ways to scroll non-headless
                            
                                how to get unique values in all columns in pandas data frame
                            
                                How does SQLAlchemy create_engine import Engine class?
                            
                                pandas corr and corrwith very slow
                            
                                Scrapy - simple captcha solving example
                            
                                How to detect the type of widget?
                            
                                How to read BigQuery table using python pipeline code in GCP Dataflow
                            
                                Argparse: defaults from file
                            
                                Pandas find last non NAN value
                            
                                Loop through divs inside div in Selenium/Python
                            
                                Calculate count of all the elements in nested list
                            
                                Splitting a column in pyspark
                            
                                Writing full csv table to PDF in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Add RandomForestClassifier Predict_Proba Results to Original Dataframe

Tags:

python

python-3.x

pandas

dataframe

random-forest

Python_Learner_DK

People also ask

3 Answers

Joe

Mabel Villalba

Sociopath

Recent Activity

Donate For Us