Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove empty lists in pandas series

I have a long series like the following:

series = pd.Series([[(1,2)],[(3,5)],[],[(3,5)]])

In [151]: series
Out[151]:
0    [(1, 2)]
1    [(3, 5)]
2          []
3    [(3, 5)]
dtype: object

I want to remove all entries with an empty list. For some reason, boolean indexing does not work.

The following tests both give the same error:

series == [[(1,2)]]
series == [(1,2)]

ValueError: Arrays were different lengths: 4 vs 1

This is very strange, because in the simple example below, indexing works just like above:

In [146]: pd.Series([1,2,3]) == [3]
Out[146]:
0    False
1    False
2     True
dtype: bool

P.S. ideally, I'd like to split the tuples in the series into a DataFrame of two columns also.

like image 511
The Unfun Cat Avatar asked Mar 17 '15 13:03

The Unfun Cat


1 Answers

You could check to see if the lists are empty using str.len():

series.str.len() == 0

and then use this boolean series to remove the rows containing empty lists.

If each of your entries is a list containing a two-tuple (or else empty), you could create a two-column DataFrame by using the str accessor twice (once to select the first element of the list, then to access the elements of the tuple):

pd.DataFrame({'a': series.str[0].str[0], 'b': series.str[0].str[1]})

Missing entries default to NaN with this method.

like image 116
Alex Riley Avatar answered Sep 22 '22 06:09

Alex Riley