Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: TypeError: unhashable type: 'list'

I have the following df:

df = pd.DataFrame(
    [
        [["John Muller"], "person", [8866155845]],
        [["Innovation Division"], "company", np.nan],
        [["Carol Sway"], "person", [8866155845]],
    ],
    columns=["name", "kind", "phone"],
)

# Out:
#                     name     kind         phone
# 0          [John Muller]   person  [8866155845]
# 1  [Innovation Division]  company           NaN
# 2           [Carol Sway]   person  [8866155845]

and I want to find duplicates of a phone number. But the objects in df are lists, so using:

df.duplicated('phone') 

will generate the error:

TypeError: unhashable type: 'list'
like image 572
Lucia Commerz Avatar asked Apr 25 '18 10:04

Lucia Commerz


3 Answers

You can also use applymap function which is quite handy to solve this problem:

# get duplicated row
df2 = df[df.applymap(lambda x: x[0] if isinstance(x, list) else x).duplicated('phone')]

print(df2)

           name    kind         phone
2  [Carol Sway]  person  [8866155845]
like image 101
YOLO Avatar answered Nov 18 '22 12:11

YOLO


You will be surprised that pd.DataFrame.duplicated works differently when compared to pd.Series.duplicated. You are right that df.duplicated("phone") will throw TypeError, but using df.phone.duplicated() directly will succeed.

df[df.phone.duplicated()]  # or df[df["phone"].duplicated()]

#           name    kind         phone
# 2  [Carol Sway]  person  [8866155845]

Another simple and useful way, how to deal with list objects in DataFrames, is using explode method which is transforming list-like elements to a row (but be aware it replicates index). You could use it in a following manner:

df_exploded = df.explode("phone")
df_exploded[df_exploded.duplicated("phone")]

#            name    kind       phone
# 2  [Carol Sway]  person  8866155845

Or if you are only interested in duplicated phone numbers, you can then do something like df["phone"].explode().value_counts() to see how many times are particular numbers duplicated.

like image 37
Nerxis Avatar answered Nov 18 '22 10:11

Nerxis


Use can use the hashable_df package:

from hashable_df import hashable_df
hashable_df(df).duplicated('phone')

This will make all unhashable cell values hashable and these kind of operations to work.

like image 1
user582175 Avatar answered Nov 18 '22 10:11

user582175