Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group operations on Pandas column containing lists

Tags:

python

pandas

I have a DataFrame containing a column, props, which contains lists of strings.

Ideally, I'd like to group by this column, but I predictably get an error when I do:

TypeError: unhashable type: 'list'

Is there a sensible way to re-arrange my DataFrame so I can work with these values?

like image 675
urschrei Avatar asked Oct 28 '13 12:10

urschrei


People also ask

How do I group values in a column in pandas?

Groupby is a very powerful pandas method. You can group by one column and count the values of another column per this column value using value_counts. Using groupby and value_counts we can count the number of activities each person did.

How do you a group by to a list in pandas DataFrame?

You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.

Can you Groupby two columns pandas?

Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.


2 Answers

You can convert the lists of strings into tuples of strings. Tuples are hashable, as they are unmutable. This is of course assuming that you don't need to be adding to or removing from those lists after creation.

like image 81
Matti Lyra Avatar answered Nov 16 '22 16:11

Matti Lyra


You can use the immutable counterpart to lists, which are tuples:

>>> import pandas as pd
>>> df = pd.DataFrame([[[1, 2], 'ab'], [[2, 3], 'bc']])
>>> df.groupby(0).groups
...
... TypeError: unhashable type: 'list'

You could apply the conversion on the appropriate column:

>>> df[0] = df[0].apply(tuple)
>>> df.groupby(0).groups
{(1, 2): [0], (2, 3): [1]}
like image 38
miku Avatar answered Nov 16 '22 16:11

miku