Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Flatten a Series in pandas, i.e. a series whose elements are lists

I have a series of the form:

s = Series([['a','a','b'],['b','b','c','d'],[],['a','b','e']])

which looks like

0       [a, a, b]
1    [b, b, c, d]
2              []
3       [a, b, e]
dtype: object

I would like to count how many elements I have in total. My naive tentatives like

s.values.hist()

or

s.values.flatten()

didn't work. What am I doing wrong?

like image 831
meto Avatar asked Jun 04 '14 01:06

meto


2 Answers

If we stick with the pandas Series as in the original question, one neat option from the Pandas version 0.25.0 onwards is the Series.explode() routine. It returns an exploded list to rows, where the index will be duplicated for these rows.

The original Series from the question:

s = pd.Series([['a','a','b'],['b','b','c','d'],[],['a','b','e']])

Let's explode it and we get a Series, where the index is repeated. The index indicates the index of the original list.

>>> s.explode()
Out:
0      a
0      a
0      b
1      b
1      b
1      c
1      d
2    NaN
3      a
3      b
3      e
dtype: object

>>> type(s.explode())
Out:
pandas.core.series.Series

To count the number of elements we can now use the Series.value_counts():

>>> s.explode().value_counts()
Out:
b    4
a    3
d    1
c    1
e    1
dtype: int64

To include also NaN values:

>>> s.explode().value_counts(dropna=False)
Out:
b      4
a      3
d      1
c      1
e      1
NaN    1
dtype: int64

Finally, plotting the histogram using Series.plot():

>>> s.explode().value_counts(dropna=False).plot(kind = 'bar')

Histogram of the Series

like image 127
heilala Avatar answered Oct 01 '22 23:10

heilala


s.map(len).sum()

does the trick. s.map(len) applies len() to each element and returns a series of all the lengths, then you can just use sum on that series.

like image 33
Marius Avatar answered Oct 01 '22 23:10

Marius