Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove rows containing empty list of tuples in pandas

Tags:

python

pandas

I have a dataframe like following

name     foo_list
'foo'    [('bleh'), ('blah')]
'bar'    [(), 'boo']
'foobar'  [(), (), ()]

I want to remove all the empty tuples and incase all the vals in list are empty tuples, just drop the row entirely. Also, I want to convert this list of tuples into list. So, output would be

name     foo_list
'foo'    ['bleh', 'blah']
'bar'    [ 'boo']

How do i do this in pandas?

like image 983
frazman Avatar asked Mar 08 '23 01:03

frazman


1 Answers

Try this ?

Data Input:

df=pd.DataFrame({'name':['A','B','C'],'foo_list':[[('bleh'),('blah')], [(), 'boo'],[(), (), ()]]})

Solution:

df['foo_list']=df['foo_list'].apply(lambda x : [t for t in x if t != ()])
df.loc[df['foo_list'].apply(len)>0,:]

Out[20]: 
       foo_list name
0  [bleh, blah]    A
1         [boo]    B

Timing(small size)

%timeit df['foo_list'].apply(lambda x : [t for t in x if t != ()])#Wen
10000 loops, best of 3: 117 µs per loop

%timeit df.foo_list.apply(lambda x: filter(None, x)) # John
10000 loops, best of 3: 121 µs per loop

large size will recommend John's solution

df=pd.concat([df]*10000,0)

%timeit df.foo_list.apply(lambda x: filter(None, x))
100 loops, best of 3: 10.2 ms per loop
%timeit df['foo_list'].apply(lambda x : [t for t in x if t != ()])
100 loops, best of 3: 17.1 ms per loop
like image 77
BENY Avatar answered Mar 10 '23 14:03

BENY