I'm doing a multiclass text classification in Scikit-Learn. The dataset is being trained using the Multinomial Naive Bayes classifier having hundreds of labels. Here's an extract from the Scikit Learn script for fitting the MNB model
from __future__ import print_function  # Read **`file.csv`** into a pandas DataFrame  import pandas as pd path = 'data/file.csv' merged = pd.read_csv(path, error_bad_lines=False, low_memory=False)  # define X and y using the original DataFrame X = merged.text y = merged.grid  # split X and y into training and testing sets; from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)  # import and instantiate CountVectorizer from sklearn.feature_extraction.text import CountVectorizer vect = CountVectorizer()  # create document-term matrices using CountVectorizer X_train_dtm = vect.fit_transform(X_train) X_test_dtm = vect.transform(X_test)  # import and instantiate MultinomialNB from sklearn.naive_bayes import MultinomialNB nb = MultinomialNB()  # fit a Multinomial Naive Bayes model nb.fit(X_train_dtm, y_train)  # make class predictions y_pred_class = nb.predict(X_test_dtm)  # generate classification report from sklearn import metrics print(metrics.classification_report(y_test, y_pred_class))  And a simplified output of the metrics.classification_report on command line screen looks like this:
             precision  recall   f1-score   support      12       0.84      0.48      0.61      2843      13       0.00      0.00      0.00        69      15       1.00      0.19      0.32       232      16       0.75      0.02      0.05       965      33       1.00      0.04      0.07       155       4       0.59      0.34      0.43      5600      41       0.63      0.49      0.55      6218      42       0.00      0.00      0.00       102      49       0.00      0.00      0.00        11       5       0.90      0.06      0.12      2010      50       0.00      0.00      0.00         5      51       0.96      0.07      0.13      1267      58       1.00      0.01      0.02       180      59       0.37      0.80      0.51      8127       7       0.91      0.05      0.10       579       8       0.50      0.56      0.53      7555           avg/total 0.59      0.48      0.45     35919  I was wondering if there was any way to get the report output into a standard csv file with regular column headers
When I send the command line output into a csv file or try to copy/paste the screen output into a spreadsheet - Openoffice Calc or Excel, It lumps the results in one column. Looking like this:

As of scikit-learn v0.20, the easiest way to convert a classification report to a pandas Dataframe is by simply having the report returned as a dict:
report = classification_report(y_test, y_pred, output_dict=True)  and then construct a Dataframe and transpose it:
df = pandas.DataFrame(report).transpose()  From here on, you are free to use the standard pandas methods to generate your desired output formats (CSV, HTML, LaTeX, ...).
See the documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With