Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AttributeError: 'numpy.int64' object has no attribute 'startswith'

I'm getting this error when I try to create a factorplot with seaborn in an ipython notebook.

Here's the end of the stack trace:

/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/axes.pyc in get_legend_handles_labels(self, legend_handler_map)
   4317             label = handle.get_label()
   4318             #if (label is not None and label != '' and not label.startswith('_')):
-> 4319             if label and not label.startswith('_'):
   4320                 handles.append(handle)
   4321                 labels.append(label)

AttributeError: 'numpy.int64' object has no attribute 'startswith'

Here are my imports:

import numpy as np
import pandas as pd
from pandas import Series,DataFrame

import math

import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline

from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import train_test_split

from sklearn import metrics

import statsmodels.api as sm

And here's my code:

df = sm.datasets.fair.load_pandas().data
df['had_affair'] = df.affairs.apply(lambda x: 1 if x != 0 else 0)
sns.factorplot('age', data=df, hue='had_affair', palette='coolwarm')

The problem seems to be that the column I'm using for the hue is an integer and not a string. Creating a new column using something like df['had_affair_str'] = df.had_affair.apply(str) and then using had_affair_str as my hue makes the error go away, but the online tutorial I'm following uses this exact code without getting any errors. Is this a known issue with matplotlib or seaborn? Is one of my packages out of date?

Here are the versions for my python packages:

ipython==3.1.0
numpy==1.9.2
pandas==0.16.1
matplotlib==1.4.3
seaborn==0.5.1
scikit-learn==0.16.1
statsmodels==0.6.1

edit:

Output from df.info():

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6366 entries, 0 to 6365
Data columns (total 11 columns):
rate_marriage      6366 non-null float64
age                6366 non-null float64
yrs_married        6366 non-null float64
children           6366 non-null float64
religious          6366 non-null float64
educ               6366 non-null float64
occupation         6366 non-null float64
occupation_husb    6366 non-null float64
affairs            6366 non-null float64
had_affair         6366 non-null int64
had_affair_str     6366 non-null object
dtypes: float64(9), int64(1), object(1)
memory usage: 596.8+ KB
like image 253
mplis Avatar asked Oct 14 '25 16:10

mplis


1 Answers

matplotlib is expecting the dtypes of your label series had_affair to be object/string, but it's numpy.int64

You can forcely convert the numpy.int64 into string using this:

df['had_affair'] = df['had_affair'].astype(str)
like image 108
kingbase Avatar answered Oct 17 '25 04:10

kingbase



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!