The is the input "dirty" list in python
input_list = [' \n ',' data1\n ',' data2\n',' \n','data3\n'.....]
each list element contains either empty spaces with new line chars or data with newline chars
Cleaned it up using the below code..
cleaned_up_list = [data.strip() for data in input_list if data.strip()]
gives
cleaned_up_list = ['data1','data2','data3','data4'..]
Does python internally call strip()
twice during the above list comprehension? or would i have to use a for
loop iteration and strip()
just once if i cared about efficiency?
for data in input_list
clean_data = data.strip()
if(clean_data):
cleaned_up_list.append(clean_data)
Using your list comp strip is called twice, use a gen exp if you want to only call strip once and keep the comprehension:
input_list[:] = [x for x in (s.strip() for s in input_list) if x]
Input:
input_list = [' \n ',' data1\n ',' data2\n',' \n','data3\n']
Output:
['data1', 'data2', 'data3']
input_list[:]
will change the original list which may or may not be what you want, if you actually want to create a new list just use cleaned_up_list = ...
.
I always found using itertools.imap
in python 2 and map
in python 3 instead of the generator to be the most efficient for larger inputs:
from itertools import imap
input_list[:] = [x for x in imap(str.strip, input_list) if x]
Some timings with different approaches:
In [17]: input_list = [choice(input_list) for _ in range(1000000)]
In [19]: timeit filter(None, imap(str.strip, input_list))
10 loops, best of 3: 115 ms per loop
In [20]: timeit list(ifilter(None,imap(str.strip,input_list)))
10 loops, best of 3: 110 ms per loop
In [21]: timeit [x for x in imap(str.strip,input_list) if x]
10 loops, best of 3: 125 ms per loop
In [22]: timeit [x for x in (s.strip() for s in input_list) if x]
10 loops, best of 3: 145 ms per loop
In [23]: timeit [data.strip() for data in input_list if data.strip()]
10 loops, best of 3: 160 ms per loop
In [24]: %%timeit
....: cleaned_up_list = []
....: for data in input_list:
....: clean_data = data.strip()
....: if clean_data:
....: cleaned_up_list.append(clean_data)
....:
10 loops, best of 3: 150 ms per loop
In [25]:
In [25]: %%timeit
....: cleaned_up_list = []
....: append = cleaned_up_list.append
....: for data in input_list:
....: clean_data = data.strip()
....: if clean_data:
....: append(clean_data)
....:
10 loops, best of 3: 123 ms per loop
The fastest approach is actually itertools.ifilter
combined with itertools.imap
closely followed by filter
with imap
.
Removing the need to reevaluate the function reference list.append
each iteration is more efficient, if you were stuck with a loop and wanted the most efficient approach then it is a viable alternative.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With