pandas: TypeError: unhashable type: 'list'

Question

I have the following df:

df = pd.DataFrame(
    [
        [["John Muller"], "person", [8866155845]],
        [["Innovation Division"], "company", np.nan],
        [["Carol Sway"], "person", [8866155845]],
    ],
    columns=["name", "kind", "phone"],
)

# Out:
#                     name     kind         phone
# 0          [John Muller]   person  [8866155845]
# 1  [Innovation Division]  company           NaN
# 2           [Carol Sway]   person  [8866155845]

and I want to find duplicates of a phone number. But the objects in df are lists, so using:

df.duplicated('phone')

will generate the error:

TypeError: unhashable type: 'list'

YOLO · Accepted Answer

You can also use applymap function which is quite handy to solve this problem:

# get duplicated row
df2 = df[df.applymap(lambda x: x[0] if isinstance(x, list) else x).duplicated('phone')]

print(df2)

           name    kind         phone
2  [Carol Sway]  person  [8866155845]

Nerxis · Answer

You will be surprised that pd.DataFrame.duplicated works differently when compared to pd.Series.duplicated. You are right that df.duplicated("phone") will throw TypeError, but using df.phone.duplicated() directly will succeed.

df[df.phone.duplicated()]  # or df[df["phone"].duplicated()]

#           name    kind         phone
# 2  [Carol Sway]  person  [8866155845]

Another simple and useful way, how to deal with list objects in DataFrames, is using explode method which is transforming list-like elements to a row (but be aware it replicates index). You could use it in a following manner:

df_exploded = df.explode("phone")
df_exploded[df_exploded.duplicated("phone")]

#            name    kind       phone
# 2  [Carol Sway]  person  8866155845

Or if you are only interested in duplicated phone numbers, you can then do something like df["phone"].explode().value_counts() to see how many times are particular numbers duplicated.

user582175 · Answer

Use can use the hashable_df package:

from hashable_df import hashable_df
hashable_df(df).duplicated('phone')

This will make all unhashable cell values hashable and these kind of operations to work.

pandas: TypeError: unhashable type: 'list'

Tags:

python

list

pandas

duplicates

hash

Lucia Commerz

3 Answers

YOLO

Nerxis

user582175

Recent Activity

Donate For Us

pandas: TypeError: unhashable type: 'list'

Tags:

python

list

pandas

duplicates

hash

Lucia Commerz

3 Answers

YOLO

Nerxis

user582175

Related questions

Recent Activity

Donate For Us