scikit learn output metrics.classification_report into CSV/tab-delimited format

Question

I'm doing a multiclass text classification in Scikit-Learn. The dataset is being trained using the Multinomial Naive Bayes classifier having hundreds of labels. Here's an extract from the Scikit Learn script for fitting the MNB model

from __future__ import print_function  # Read **`file.csv`** into a pandas DataFrame  import pandas as pd path = 'data/file.csv' merged = pd.read_csv(path, error_bad_lines=False, low_memory=False)  # define X and y using the original DataFrame X = merged.text y = merged.grid  # split X and y into training and testing sets; from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)  # import and instantiate CountVectorizer from sklearn.feature_extraction.text import CountVectorizer vect = CountVectorizer()  # create document-term matrices using CountVectorizer X_train_dtm = vect.fit_transform(X_train) X_test_dtm = vect.transform(X_test)  # import and instantiate MultinomialNB from sklearn.naive_bayes import MultinomialNB nb = MultinomialNB()  # fit a Multinomial Naive Bayes model nb.fit(X_train_dtm, y_train)  # make class predictions y_pred_class = nb.predict(X_test_dtm)  # generate classification report from sklearn import metrics print(metrics.classification_report(y_test, y_pred_class))

And a simplified output of the metrics.classification_report on command line screen looks like this:

             precision  recall   f1-score   support      12       0.84      0.48      0.61      2843      13       0.00      0.00      0.00        69      15       1.00      0.19      0.32       232      16       0.75      0.02      0.05       965      33       1.00      0.04      0.07       155       4       0.59      0.34      0.43      5600      41       0.63      0.49      0.55      6218      42       0.00      0.00      0.00       102      49       0.00      0.00      0.00        11       5       0.90      0.06      0.12      2010      50       0.00      0.00      0.00         5      51       0.96      0.07      0.13      1267      58       1.00      0.01      0.02       180      59       0.37      0.80      0.51      8127       7       0.91      0.05      0.10       579       8       0.50      0.56      0.53      7555           avg/total 0.59      0.48      0.45     35919

I was wondering if there was any way to get the report output into a standard csv file with regular column headers

When I send the command line output into a csv file or try to copy/paste the screen output into a spreadsheet - Openoffice Calc or Excel, It lumps the results in one column. Looking like this:

enter image description here

janus235 · Accepted Answer

As of scikit-learn v0.20, the easiest way to convert a classification report to a pandas Dataframe is by simply having the report returned as a dict:

report = classification_report(y_test, y_pred, output_dict=True)

and then construct a Dataframe and transpose it:

df = pandas.DataFrame(report).transpose()

From here on, you are free to use the standard pandas methods to generate your desired output formats (CSV, HTML, LaTeX, ...).

See the documentation.

scikit learn output metrics.classification_report into CSV/tab-delimited format

Tags:

python

text

csv

classification

scikit-learn

Seun AJAO

1 Answers

janus235

Recent Activity

Donate For Us

scikit learn output metrics.classification_report into CSV/tab-delimited format

Tags:

python

text

csv

classification

scikit-learn

Seun AJAO

1 Answers

janus235

Related questions

Recent Activity

Donate For Us