I have a DataFrame containing a column, props
, which contains lists of strings.
Ideally, I'd like to group by this column, but I predictably get an error when I do:
TypeError: unhashable type: 'list'
Is there a sensible way to re-arrange my DataFrame so I can work with these values?
Groupby is a very powerful pandas method. You can group by one column and count the values of another column per this column value using value_counts. Using groupby and value_counts we can count the number of activities each person did.
You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.
Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.
You can convert the lists of strings into tuples of strings. Tuples are hashable, as they are unmutable. This is of course assuming that you don't need to be adding to or removing from those lists after creation.
You can use the immutable counterpart to lists, which are tuples:
>>> import pandas as pd
>>> df = pd.DataFrame([[[1, 2], 'ab'], [[2, 3], 'bc']])
>>> df.groupby(0).groups
...
... TypeError: unhashable type: 'list'
You could apply
the conversion on the appropriate column:
>>> df[0] = df[0].apply(tuple)
>>> df.groupby(0).groups
{(1, 2): [0], (2, 3): [1]}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With