Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data prints, but does not write to dataframe

I am attempting to calculate the True Positive rate ect. of a binary confusion matrix, and output the results to a csv file.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import csv
from sklearn.metrics import confusion_matrix



AllBinary = pd.read_csv('BinaryData.csv')


y_test = AllBinary['Binary_ac']
y_pred = AllBinary['Binary_pred']

cm = confusion_matrix(y_test, y_pred)

stats = pd.DataFrame()

TP = cm[0][0]
FP = cm[0][1]
FN = cm[1][0]
TN = cm[1][1]

stats['TruePositive'] = TP
stats['TrueNegative'] = TN
stats['FalsePositive'] = FP
stats['FalseNegative'] = FN

print(TP)
print(TN)
print(FP)
print(FN)

stats.to_csv('C:/out/' + 'BinaryStats' + '.csv', header = True)

The print results show that the basic confusion matrix stats are calculated as follows:

210483
153902
32845
10788

The csv output creates the headings, but the results are blank. What am I doing incorrectly?

Update:

print(stats)

Empty DataFrame
Columns: [TruePositive, TrueNegative, Falsepositive, FalseNegative]
like image 331
kharn Avatar asked Oct 22 '25 06:10

kharn


1 Answers

The problem here is that you can't append to a df like this by simply assigning a scalar value to a new column:

In [55]:
stats = pd.DataFrame()
stats['TruePositive'] = 210483
stats

Out[55]:
Empty DataFrame
Columns: [TruePositive]
Index: []

You'll need to construct the df with the desired values in the ctor:

In [62]:
TP = 210483
FP = 153902
FN = 32845
TN = 10788
stats = pd.DataFrame({'TruePositive':[TP], 'TrueNegative':[TN], 'FalsePositive':[FP], 'FalseNegative':[FN]})
stats

Out[62]:
   FalseNegative  FalsePositive  TrueNegative  TruePositive
0          32845         153902         10788        210483

OR add a dummy row and then your code will work:

In [71]:
stats = pd.DataFrame()
stats = stats.append(pd.Series('dummy'), ignore_index=True)
stats['TruePositive'] = TP
stats['TrueNegative'] = TN
stats['FalsePositive'] = FP
stats['FalseNegative'] = FN
stats

Out[71]:
       0  TruePositive  TrueNegative  FalsePositive  FalseNegative
0  dummy        210483         10788         153902          32845

You can then drop the dummy column calling drop:

In [72]:
stats.drop(0, axis=1)

Out[72]:
   TruePositive  TrueNegative  FalsePositive  FalseNegative
0        210483         10788         153902          32845

So why your attempt failed is because your initial df was empty, you're assigning a new column with a scalar value, the scalar value will set all rows for the new column to this value. As your df has no rows this fails which is why you have an empty df.

Another way would be to create the df with a single row (here I put NaN):

In [77]:
stats = pd.DataFrame([np.NaN])
stats['TruePositive'] = TP
stats['TrueNegative'] = TN
stats['FalsePositive'] = FP
stats['FalseNegative'] = FN
stats.dropna(axis=1)

Out[77]:
   TruePositive  TrueNegative  FalsePositive  FalseNegative
0        210483         10788         153902          32845
like image 170
EdChum Avatar answered Oct 24 '25 19:10

EdChum