Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort lists in a Pandas Dataframe column

I have a Dataframe column which is a collection of lists

    a
['a', 'b']
['b', 'a']
['a', 'c']
['c', 'a']

I would like to use this list to group by its unique values (['a', 'b'] & ['a', 'c']). However, this generates an error

TypeError: unhashable type: 'list'

Is there any way around this. Ideally I would like to sort the values in place and create an additional column of a concatenated string.

like image 556
Jack Cooper Avatar asked Oct 06 '16 15:10

Jack Cooper


1 Answers

You can also sort values by column.

Example:

x = [['a', 'b'], ['b', 'a'], ['a', 'c'], ['c', 'a']]
df = pandas.DataFrame({'a': Series(x)})
df.a.sort_values()

     a
0   [a, b]
2   [a, c]
1   [b, a]
3   [c, a]

However, for what I understand, you want to sort [b, a] to [a, b], and [c, a] to [a, c] and then set values in order to get only [a, b][a, c].

i'd recommend use lambda

Try:

result = df.a.sort_values().apply(lambda x: sorted(x))
result = DataFrame(result).reset_index(drop=True)

It returns:

0    [a, b]
1    [a, c]
2    [a, b]
3    [a, c]

Then get unique values:

newdf = pandas.DataFrame({'a': Series(list(set(result['a'].apply(tuple))))})
newdf.sort_values(by='a')

     a
0   (a, b)
1   (a, c)
like image 198
estebanpdl Avatar answered Sep 23 '22 02:09

estebanpdl