I'm doing a multiclass text classification in Scikit-Learn. The dataset is being trained using the Multinomial Naive Bayes classifier having hundreds of labels. Here's an extract from the Scikit Learn script for fitting the MNB model
from __future__ import print_function # Read **`file.csv`** into a pandas DataFrame import pandas as pd path = 'data/file.csv' merged = pd.read_csv(path, error_bad_lines=False, low_memory=False) # define X and y using the original DataFrame X = merged.text y = merged.grid # split X and y into training and testing sets; from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1) # import and instantiate CountVectorizer from sklearn.feature_extraction.text import CountVectorizer vect = CountVectorizer() # create document-term matrices using CountVectorizer X_train_dtm = vect.fit_transform(X_train) X_test_dtm = vect.transform(X_test) # import and instantiate MultinomialNB from sklearn.naive_bayes import MultinomialNB nb = MultinomialNB() # fit a Multinomial Naive Bayes model nb.fit(X_train_dtm, y_train) # make class predictions y_pred_class = nb.predict(X_test_dtm) # generate classification report from sklearn import metrics print(metrics.classification_report(y_test, y_pred_class))
And a simplified output of the metrics.classification_report on command line screen looks like this:
precision recall f1-score support 12 0.84 0.48 0.61 2843 13 0.00 0.00 0.00 69 15 1.00 0.19 0.32 232 16 0.75 0.02 0.05 965 33 1.00 0.04 0.07 155 4 0.59 0.34 0.43 5600 41 0.63 0.49 0.55 6218 42 0.00 0.00 0.00 102 49 0.00 0.00 0.00 11 5 0.90 0.06 0.12 2010 50 0.00 0.00 0.00 5 51 0.96 0.07 0.13 1267 58 1.00 0.01 0.02 180 59 0.37 0.80 0.51 8127 7 0.91 0.05 0.10 579 8 0.50 0.56 0.53 7555 avg/total 0.59 0.48 0.45 35919
I was wondering if there was any way to get the report output into a standard csv file with regular column headers
When I send the command line output into a csv file or try to copy/paste the screen output into a spreadsheet - Openoffice Calc or Excel, It lumps the results in one column. Looking like this:
As of scikit-learn
v0.20, the easiest way to convert a classification report to a pandas
Dataframe is by simply having the report returned as a dict
:
report = classification_report(y_test, y_pred, output_dict=True)
and then construct a Dataframe and transpose it:
df = pandas.DataFrame(report).transpose()
From here on, you are free to use the standard pandas
methods to generate your desired output formats (CSV, HTML, LaTeX, ...).
See the documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With