I have the following dataframe where I would like to print the unique values of the color
column.
df = pd.DataFrame({'colors': ['green', 'green', 'purple', ['yellow , red'], 'orange'], 'names': ['Terry', 'Nor', 'Franck', 'Pete', 'Agnes']})
Output:
colors names
0 green Terry
1 green Nor
2 purple Franck
3 [yellow , red] Pete
4 orange Agnes
df.colors.unique()
would work fine if there wasn't the [yellow , red]
row. As it is I keep getting the TypeError: unhashable type: 'list'
error which is understandable.
Is there a way to still get the unique values without taking this row into account?
I tried the followings but none worked:
df = df[~df.colors.str.contains(',', na=False)] # Nothing happens
df = df[~df.colors.str.contains('[', na=False)] # Output: error: unterminated character set at position 0
df = df[~df.colors.str.contains(']', na=False)] # Nothing happens
If values are lists check it by isinstance
method:
#changed sample data
df = pd.DataFrame({'colors': ['green', 'green', 'purple', ['yellow' , 'red'], 'orange'],
'names': ['Terry', 'Nor', 'Franck', 'Pete', 'Agnes']})
df = df[~df.colors.map(lambda x : isinstance(x, list))]
print (df)
colors names
0 green Terry
1 green Nor
2 purple Franck
4 orange Agnes
Your solution should be changed with casting to strings and regex=False
parameter:
df = df[~df.colors.astype(str).str.contains('[', na=False, regex=False)]
print (df)
colors names
0 green Terry
1 green Nor
2 purple Franck
4 orange Agnes
Also if want all unique values included lists for pandas 0.25+:
s = df.colors.map(lambda x : x if isinstance(x, list) else [x]).explode().unique().tolist()
print (s)
['green', 'purple', 'yellow', 'red', 'orange']
Let us using type
df.colors.apply(lambda x : type(x)!=list)
0 True
1 True
2 True
3 False
4 True
Name: colors, dtype: bool
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With