Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting TypeError: reduction operation 'argmax' not allowed for this dtype when trying to use idxmax()

Tags:

When using the idxmax() function in Pandas, I keep receiving this error.

Traceback (most recent call last):   File "/Users/username/College/year-4/fyp-credit-card-fraud/code/main.py", line 20, in <module>     best_c_param = classify.print_kfold_scores(X_training_undersampled, y_training_undersampled)   File "/Users/username/College/year-4/fyp-credit-card-fraud/code/Classification.py", line 39, in print_kfold_scores     best_c_param = results.loc[results['Mean recall score'].idxmax()]['C_parameter']   File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/series.py", line 1369, in idxmax     i = nanops.nanargmax(_values_from_object(self), skipna=skipna)   File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/nanops.py", line 74, in _f     raise TypeError(msg.format(name=f.__name__.replace('nan', ''))) TypeError: reduction operation 'argmax' not allowed for this dtype 

The Pandas version I am using is 0.22.0

main.py

import ExploratoryDataAnalysis as eda import Preprocessing as processor import Classification as classify import pandas as pd   data_path = '/Users/username/college/year-4/fyp-credit-card-fraud/data/'  if __name__ == '__main__':     df = pd.read_csv(data_path + 'creditcard.csv')     # eda.init(df)     # eda.check_null_values()     # eda.view_data()     # eda.check_target_classes()     df = processor.noramlize(df)      X_training, X_testing, y_training, y_testing, X_training_undersampled, X_testing_undersampled, \     y_training_undersampled, y_testing_undersampled = processor.resample(df)      best_c_param = classify.print_kfold_scores(X_training_undersampled, y_training_undersampled) 

Classification.py

from sklearn.linear_model import LogisticRegression from sklearn.cross_validation import KFold, cross_val_score from sklearn.metrics import confusion_matrix, precision_recall_curve, auc, \     roc_auc_score, roc_curve, recall_score, classification_report import pandas as pd import numpy as np   def print_kfold_scores(X_training, y_training):     print('\nKFold\n')      fold = KFold(len(y_training), 5, shuffle=False)      c_param_range = [0.01, 0.1, 1, 10, 100]      results = pd.DataFrame(index=range(len(c_param_range), 2), columns=['C_parameter', 'Mean recall score'])     results['C_parameter'] = c_param_range      j = 0     for c_param in c_param_range:         print('-------------------------------------------')         print('C parameter: ', c_param)         print('\n-------------------------------------------')          recall_accs = []         for iteration, indices in enumerate(fold, start=1):             lr = LogisticRegression(C=c_param, penalty='l1')             lr.fit(X_training.iloc[indices[0], :], y_training.iloc[indices[0], :].values.ravel())              y_prediction_undersampled = lr.predict(X_training.iloc[indices[1], :].values)             recall_acc = recall_score(y_training.iloc[indices[1], :].values, y_prediction_undersampled)             recall_accs.append(recall_acc)             print('Iteration ', iteration, ': recall score = ', recall_acc)          results.ix[j, 'Mean recall score'] = np.mean(recall_accs)         j += 1         print('\nMean recall score ', np.mean(recall_accs))         print('\n')      best_c_param = results.loc[results['Mean recall score'].idxmax()]['C_parameter'] # Error occurs on this line      print('*****************************************************************')     print('Best model to choose from cross validation is with C parameter = ', best_c_param)     print('*****************************************************************')      return best_c_param 

The line that is causing the problem is this

best_c_param = results.loc[results['Mean recall score'].idxmax()]['C_parameter']

The output of the program is below

/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/username/College/year-4/fyp-credit-card-fraud/code/main.py /Users/username/Library/Python/3.6/lib/python/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.   "This module will be removed in 0.20.", DeprecationWarning) Dataset Ratios  Percentage of genuine transactions:  0.5 Percentage of fraudulent transactions 0.5 Total number of transactions in resampled data:  984   Whole Dataset Split  Number of transactions in training dataset:  199364 Number of transactions in testing dataset:  85443 Total number of transactions in dataset:  284807   Undersampled Dataset Split  Number of transactions in training dataset 688 Number of transactions in testing dataset:  296 Total number of transactions in dataset:  984  KFold  ------------------------------------------- C parameter:  0.01  ------------------------------------------- Iteration  1 : recall score =  0.931506849315 Iteration  2 : recall score =  0.917808219178 Iteration  3 : recall score =  1.0 Iteration  4 : recall score =  0.959459459459 Iteration  5 : recall score =  0.954545454545  Mean recall score  0.9526639965   ------------------------------------------- C parameter:  0.1  ------------------------------------------- Iteration  1 : recall score =  0.849315068493 Iteration  2 : recall score =  0.86301369863 Iteration  3 : recall score =  0.915254237288 Iteration  4 : recall score =  0.945945945946 Iteration  5 : recall score =  0.909090909091  Mean recall score  0.89652397189   ------------------------------------------- C parameter:  1  ------------------------------------------- Iteration  1 : recall score =  0.86301369863 Iteration  2 : recall score =  0.86301369863 Iteration  3 : recall score =  0.983050847458 Iteration  4 : recall score =  0.945945945946 Iteration  5 : recall score =  0.924242424242  Mean recall score  0.915853322981   ------------------------------------------- C parameter:  10  ------------------------------------------- Iteration  1 : recall score =  0.849315068493 Iteration  2 : recall score =  0.876712328767 Iteration  3 : recall score =  0.983050847458 Iteration  4 : recall score =  0.945945945946 Iteration  5 : recall score =  0.939393939394  Mean recall score  0.918883626012   ------------------------------------------- C parameter:  100  ------------------------------------------- Iteration  1 : recall score =  0.86301369863 Iteration  2 : recall score =  0.876712328767 Iteration  3 : recall score =  0.983050847458 Iteration  4 : recall score =  0.945945945946 Iteration  5 : recall score =  0.924242424242  Mean recall score  0.918593049009   Traceback (most recent call last):   File "/Users/username/College/year-4/fyp-credit-card-fraud/code/main.py", line 20, in <module>     best_c_param = classify.print_kfold_scores(X_training_undersampled, y_training_undersampled)   File "/Users/username/College/year-4/fyp-credit-card-fraud/code/Classification.py", line 39, in print_kfold_scores     best_c_param = results.loc[results['Mean recall score'].idxmax()]['C_parameter']   File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/series.py", line 1369, in idxmax     i = nanops.nanargmax(_values_from_object(self), skipna=skipna)   File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/nanops.py", line 74, in _f     raise TypeError(msg.format(name=f.__name__.replace('nan', ''))) TypeError: reduction operation 'argmax' not allowed for this dtype  Process finished with exit code 1 
like image 857
cod3min3 Avatar asked Feb 10 '18 10:02

cod3min3


1 Answers

The type of the cell values are, by default, non-numeric. argmin(), idxmin(), argmax() and other similar functions need the dtypes to be numeric.

The easiest solution is to use pd.to_numeric() in order to convert your series (or columns) to numeric types. An example with a data frame df with a column 'a' would be:

df['a'] = pd.to_numeric(df['a']) 

A more complete answer on type casting on pandas can be found here.

Hope that helps :)

like image 182
Lucas Azevedo Avatar answered Sep 21 '22 14:09

Lucas Azevedo