Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get unique values of a dataframe column when there are lists - python

I have the following dataframe where I would like to print the unique values of the color column.

df = pd.DataFrame({'colors': ['green', 'green', 'purple', ['yellow , red'], 'orange'], 'names': ['Terry', 'Nor', 'Franck', 'Pete', 'Agnes']})

Output:
           colors   names
0           green   Terry
1           green     Nor
2          purple  Franck
3  [yellow , red]    Pete
4          orange   Agnes

df.colors.unique() would work fine if there wasn't the [yellow , red] row. As it is I keep getting the TypeError: unhashable type: 'list' error which is understandable.

Is there a way to still get the unique values without taking this row into account?

I tried the followings but none worked:

df = df[~df.colors.str.contains(',', na=False)] # Nothing happens
df = df[~df.colors.str.contains('[', na=False)] # Output: error: unterminated character set at position 0
df = df[~df.colors.str.contains(']', na=False)] # Nothing happens
like image 557
I.M. Avatar asked Oct 17 '19 13:10

I.M.


2 Answers

If values are lists check it by isinstance method:

#changed sample data
df = pd.DataFrame({'colors': ['green', 'green', 'purple', ['yellow' , 'red'], 'orange'], 
                   'names': ['Terry', 'Nor', 'Franck', 'Pete', 'Agnes']})

df = df[~df.colors.map(lambda x : isinstance(x, list))]
print (df)
   colors   names
0   green   Terry
1   green     Nor
2  purple  Franck
4  orange   Agnes

Your solution should be changed with casting to strings and regex=False parameter:

df = df[~df.colors.astype(str).str.contains('[', na=False, regex=False)] 
print (df)
   colors   names
0   green   Terry
1   green     Nor
2  purple  Franck
4  orange   Agnes

Also if want all unique values included lists for pandas 0.25+:

s = df.colors.map(lambda x : x if isinstance(x, list) else [x]).explode().unique().tolist()
print (s)
['green', 'purple', 'yellow', 'red', 'orange']
like image 123
jezrael Avatar answered Nov 01 '22 12:11

jezrael


Let us using type

df.colors.apply(lambda x : type(x)!=list)
0     True
1     True
2     True
3    False
4     True
Name: colors, dtype: bool
like image 39
BENY Avatar answered Nov 01 '22 11:11

BENY