I am trying to isolate specific items in a list (e.g. [0, 1, 1] will return [0, 1]). I managed to get through this, but I noticed something strange.
When I tried to append to list it ran about 7 times slower then when I was concatenating strings and then splitting it.
This is my code:
import time
start = time.time()
first = [x for x in range(99999) if x % 2 == 0]
second = [x for x in range(99999) if x % 4 == 0]
values = first + second
distinct_string = ""
for i in values:
if not str(i) in distinct_string:
distinct_string += str(i) + " "
print(distinct_string.split())
print(" --- %s sec --- " % (start - time.time()))
This result end in about 5 seconds... Now for the lists:
import time
start = time.time()
first = [x for x in range(99999) if x % 2 == 0]
second = [x for x in range(99999) if x % 4 == 0]
values = first + second
distinct_list = []
for i in values:
if not i in distinct_list:
distinct_list.append(i)
print(distinct_list)
print(" --- %s sec --- " % (start - time.time()))
Runs at around 40 seconds.
What makes string faster even though I am converting a lot of values to strings?
Note that it's generally better to use timeit to compare functions, which runs the same thing multiple times to get average performance, and to factor out repeated code to focus on the performance that matters. Here's my test script:
first = [x for x in range(999) if x % 2 == 0]
second = [x for x in range(999) if x % 4 == 0]
values = first + second
def str_method(values):
distinct_string = ""
for i in values:
if not str(i) in distinct_string:
distinct_string += str(i) + " "
return [int(s) for s in distinct_string.split()]
def list_method(values):
distinct_list = []
for i in values:
if not i in distinct_list:
distinct_list.append(i)
return distinct_list
def set_method(values):
seen = set()
return [val for val in values if val not in seen and seen.add(val) is None]
if __name__ == '__main__':
assert str_method(values) == list_method(values) == set_method(values)
import timeit
funcs = [func.__name__ for func in (str_method, list_method, set_method)]
setup = 'from __main__ import {}, values'.format(', '.join(funcs))
for func in funcs:
print(func)
print(timeit.timeit(
'{}(values)'.format(func),
setup=setup,
number=1000
))
I've added int conversion to make sure that the functions return the same thing, and get the following results:
str_method
1.1685157899992191
list_method
2.6124089090008056
set_method
0.09523714500392089
Note that it is not true that searching in a list is faster than searching in a string if you have to convert the input:
>>> timeit.timeit('1 in l', setup='l = [9, 8, 7, 6, 5, 4, 3, 2, 1]')
0.15300405000016326
>>> timeit.timeit('str(1) in s', setup='s = "9 8 7 6 5 4 3 2 1"')
0.23205067300295923
Repeated appending to a list is not very efficient, as it means frequent resizing of the underlying object - the list comprehension, as shown in the set version, is more efficient.
searching in strings:
if not str(i) in distinct_string:
is much faster
then searching in lists
if not i in distinct_list:
here are lprofile lines for string search in OP code
Line # Hits Time Per Hit % Time Line Contents
17 75000 80366013 1071.5 92.7 if not str(i) in distinct_string:
18 50000 2473212 49.5 2.9 distinct_string += str(i) + " "
and for list search in OP code
39 75000 769795432 10263.9 99.1 if not i in distinct_list:
40 50000 2813804 56.3 0.4 distinct_list.append(i)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With