I'm getting this error when I try to create a factorplot with seaborn in an ipython notebook.
Here's the end of the stack trace:
/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/axes.pyc in get_legend_handles_labels(self, legend_handler_map)
4317 label = handle.get_label()
4318 #if (label is not None and label != '' and not label.startswith('_')):
-> 4319 if label and not label.startswith('_'):
4320 handles.append(handle)
4321 labels.append(label)
AttributeError: 'numpy.int64' object has no attribute 'startswith'
Here are my imports:
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
import math
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline
from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import train_test_split
from sklearn import metrics
import statsmodels.api as sm
And here's my code:
df = sm.datasets.fair.load_pandas().data
df['had_affair'] = df.affairs.apply(lambda x: 1 if x != 0 else 0)
sns.factorplot('age', data=df, hue='had_affair', palette='coolwarm')
The problem seems to be that the column I'm using for the hue
is an integer and not a string. Creating a new column using something like df['had_affair_str'] = df.had_affair.apply(str)
and then using had_affair_str
as my hue
makes the error go away, but the online tutorial I'm following uses this exact code without getting any errors. Is this a known issue with matplotlib or seaborn? Is one of my packages out of date?
Here are the versions for my python packages:
ipython==3.1.0
numpy==1.9.2
pandas==0.16.1
matplotlib==1.4.3
seaborn==0.5.1
scikit-learn==0.16.1
statsmodels==0.6.1
edit:
Output from df.info()
:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6366 entries, 0 to 6365
Data columns (total 11 columns):
rate_marriage 6366 non-null float64
age 6366 non-null float64
yrs_married 6366 non-null float64
children 6366 non-null float64
religious 6366 non-null float64
educ 6366 non-null float64
occupation 6366 non-null float64
occupation_husb 6366 non-null float64
affairs 6366 non-null float64
had_affair 6366 non-null int64
had_affair_str 6366 non-null object
dtypes: float64(9), int64(1), object(1)
memory usage: 596.8+ KB
matplotlib is expecting the dtypes of your label series had_affair
to be object/string, but it's numpy.int64
You can forcely convert the numpy.int64 into string using this:
df['had_affair'] = df['had_affair'].astype(str)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With