I've got a dF that's over 100k rows long, and a few columns wide — nothing crazy. I'm trying to subset the rows based on a list of some 4000 strings, but am struggling to figure out how to do so. Is there a way to subset using something like.
The dF looks something like this
dog_name count
===================
Jenny 2
Fido 4
Joey 7
Yeller 2
and the list of strings is contained the variable dog_name_list=['Fido', 'Yeller']
I've tried something along the lines of
df[df['dog_name'].isin(dog_name_list)
, but am getting a fun error: unhashable type: 'list'
I've checked a similar question, the docs and this rundown for subsetting data frames by seeing whether a value is present in a list, but that's got me right about nowhere, and I'm a little confused by what I'm missing. Would really appreciate someone's advice!
Use pandas. DataFrame. isin() to filter a DataFrame using a list.
I believe you have a list in your dog name column.
This works fine:
>>> df[df['dog_name'].isin(['Fido', 'Yeller'])]
dog_name count
1 Fido 4
3 Yeller 2
But if you add a list:
df.ix[4] = (['a'], 2)
>>> df
dog_name count
0 Jenny 2
1 Fido 4
2 Joey 7
3 Yeller 2
4 [a] 2
>>> df[df['dog_name'].isin(['Fido', 'Yeller'])]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-20-1b68dd948f39> in <module>()
----> 1 df[df['dog_name'].isin(['Fido', 'Yeller'])]
...
pandas/lib.pyx in pandas.lib.ismember (pandas/lib.c:5014)()
TypeError: unhashable type: 'list'
To find those bad dogs:
>>> df[[isinstance(dog, list) for dog in df.dog_name]]
dog_name count
4 [a] 2
To find all the data types in the column:
>>> set((type(dog) for dog in df.dog_name))
{list, str}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With