The is the input "dirty" list in python
input_list = [' \n ',' data1\n ',' data2\n',' \n','data3\n'.....]
each list element contains either empty spaces with new line chars or data with newline chars
Cleaned it up using the below code..
cleaned_up_list = [data.strip() for data in input_list if data.strip()]
gives
cleaned_up_list = ['data1','data2','data3','data4'..]
Does python internally call strip() twice during the above list comprehension? or would i have to use a for loop iteration and strip() just once if i cared about efficiency?
for data in input_list
clean_data = data.strip()
if(clean_data):
cleaned_up_list.append(clean_data)
Using your list comp strip is called twice, use a gen exp if you want to only call strip once and keep the comprehension:
input_list[:] = [x for x in (s.strip() for s in input_list) if x]
Input:
input_list = [' \n ',' data1\n ',' data2\n',' \n','data3\n']
Output:
['data1', 'data2', 'data3']
input_list[:] will change the original list which may or may not be what you want, if you actually want to create a new list just use cleaned_up_list = ....
I always found using itertools.imap in python 2 and map in python 3 instead of the generator to be the most efficient for larger inputs:
from itertools import imap
input_list[:] = [x for x in imap(str.strip, input_list) if x]
Some timings with different approaches:
In [17]: input_list = [choice(input_list) for _ in range(1000000)]
In [19]: timeit filter(None, imap(str.strip, input_list))
10 loops, best of 3: 115 ms per loop
In [20]: timeit list(ifilter(None,imap(str.strip,input_list)))
10 loops, best of 3: 110 ms per loop
In [21]: timeit [x for x in imap(str.strip,input_list) if x]
10 loops, best of 3: 125 ms per loop
In [22]: timeit [x for x in (s.strip() for s in input_list) if x]
10 loops, best of 3: 145 ms per loop
In [23]: timeit [data.strip() for data in input_list if data.strip()]
10 loops, best of 3: 160 ms per loop
In [24]: %%timeit
....: cleaned_up_list = []
....: for data in input_list:
....: clean_data = data.strip()
....: if clean_data:
....: cleaned_up_list.append(clean_data)
....:
10 loops, best of 3: 150 ms per loop
In [25]:
In [25]: %%timeit
....: cleaned_up_list = []
....: append = cleaned_up_list.append
....: for data in input_list:
....: clean_data = data.strip()
....: if clean_data:
....: append(clean_data)
....:
10 loops, best of 3: 123 ms per loop
The fastest approach is actually itertools.ifilter combined with itertools.imap closely followed by filterwith imap.
Removing the need to reevaluate the function reference list.append each iteration is more efficient, if you were stuck with a loop and wanted the most efficient approach then it is a viable alternative.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With