Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - list comprehension in this case is efficient?

Tags:

python

The is the input "dirty" list in python

input_list = ['  \n  ','  data1\n ','   data2\n','  \n','data3\n'.....]

each list element contains either empty spaces with new line chars or data with newline chars

Cleaned it up using the below code..

cleaned_up_list = [data.strip() for data in input_list if data.strip()]

gives

  cleaned_up_list =   ['data1','data2','data3','data4'..]

Does python internally call strip() twice during the above list comprehension? or would i have to use a for loop iteration and strip() just once if i cared about efficiency?

for data in input_list
  clean_data = data.strip()
     if(clean_data):
         cleaned_up_list.append(clean_data)
like image 821
wolfgang Avatar asked Jul 26 '15 18:07

wolfgang


1 Answers

Using your list comp strip is called twice, use a gen exp if you want to only call strip once and keep the comprehension:

input_list[:] = [x for x in (s.strip() for s in input_list) if x]

Input:

input_list = ['  \n  ','  data1\n ','   data2\n','  \n','data3\n']

Output:

 ['data1', 'data2', 'data3']

input_list[:] will change the original list which may or may not be what you want, if you actually want to create a new list just use cleaned_up_list = ....

I always found using itertools.imap in python 2 and map in python 3 instead of the generator to be the most efficient for larger inputs:

from itertools import imap
input_list[:] = [x for x in imap(str.strip, input_list) if x]

Some timings with different approaches:

In [17]: input_list = [choice(input_list) for _ in range(1000000)]   

In [19]: timeit filter(None, imap(str.strip, input_list))
10 loops, best of 3: 115 ms per loop

In [20]: timeit list(ifilter(None,imap(str.strip,input_list)))
10 loops, best of 3: 110 ms per loop

In [21]: timeit [x for x in imap(str.strip,input_list) if x]
10 loops, best of 3: 125 ms per loop

In [22]: timeit [x for x in (s.strip() for s in input_list) if x]  
10 loops, best of 3: 145 ms per loop

In [23]: timeit [data.strip() for data in input_list if data.strip()]
10 loops, best of 3: 160 ms per loop

In [24]: %%timeit                                                
   ....:     cleaned_up_list = []
   ....:     for data in input_list:
   ....:          clean_data = data.strip()
   ....:          if clean_data:
   ....:              cleaned_up_list.append(clean_data)
   ....: 

10 loops, best of 3: 150 ms per loop

In [25]: 

In [25]: %%timeit                                                    
   ....:     cleaned_up_list = []
   ....:     append = cleaned_up_list.append
   ....:     for data in input_list:
   ....:          clean_data = data.strip()
   ....:          if clean_data:
   ....:              append(clean_data)
   ....: 

10 loops, best of 3: 123 ms per loop

The fastest approach is actually itertools.ifilter combined with itertools.imap closely followed by filterwith imap.

Removing the need to reevaluate the function reference list.append each iteration is more efficient, if you were stuck with a loop and wanted the most efficient approach then it is a viable alternative.

like image 169
Padraic Cunningham Avatar answered Nov 15 '22 16:11

Padraic Cunningham