Lets assume I have one list and another tuple both of them are already sorted:
A = [10, 20, 30, 40]
B = (20, 60, 81, 90)
What I would need is to add all the elements from B into A in such a way that A remains sorted.
Solution I could come with was:
for item in B:
for i in range(0, len(A)):
if item > A[i]:
i += 1
else:
A.insert(i, item)
assuming A of size m, and B of size n; this solution would take O(mx
n) in worst case, how can I make it perform better ?
A simple way would be heapq.merge:
A = [10, 20, 30, 40]
B = (20, 60, 81, 90)
from heapq import merge
for ele in merge(A,B):
print(ele)
Output:
10
20
20
30
40
60
81
90
Some timings using the other O(n)
solution:
In [53]: A = list(range(10000))
In [54]: B = list(range(1,20000,10))
In [55]: timeit list(merge(A,B))
100 loops, best of 3: 2.52 ms per loop
In [56]: %%timeit
C = []
i = j = 0
while i < len(A) and j < len(B):
if A[i] < B[j]:
C.append(A[i])
i += 1
else:
C.append(B[j])
j += 1
C += A[i:] + B[j:]
....:
100 loops, best of 3: 4.29 ms per loop
In [58]: m =list(merge(A,B))
In [59]: m == C
Out[59]: True
If you wanted to roll your own this is a bit faster than merge:
def merger_try(a, b):
if not a or not b:
yield chain(a, b)
iter_a, iter_b = iter(a), iter(b)
prev_a, prev_b = next(iter_a), next(iter_b)
while True:
if prev_a >= prev_b:
yield prev_b
try:
prev_b = next(iter_b)
except StopIteration:
yield prev_a
break
else:
yield prev_a
try:
prev_a = next(iter_a)
except StopIteration:
yield prev_b
break
for ele in chain(iter_b, iter_a):
yield ele
Some timings:
In [128]: timeit list(merge(A,B))
1 loops, best of 3: 771 ms per loop
In [129]: timeit list(merger_try(A,B))
1 loops, best of 3: 581 ms per loop
In [130]: list(merger_try(A,B)) == list(merge(A,B))
Out[130]: True
In [131]: %%timeit
C = []
i = j = 0
while i < len(A) and j < len(B):
if A[i] < B[j]:
C.append(A[i])
i += 1
else:
C.append(B[j])
j += 1
C += A[i:] + B[j:]
.....:
1 loops, best of 3: 919 ms per loop
bisect
module "provides support for maintaining a list in sorted order without having to sort the list after each insertion":
import bisect
for b in B:
bisect.insort(A, b)
This solution does not create a new list.
Please note that bisect.insort(A, b)
is equivalent to
A.insert(bisect.bisect_right(A, b), b)
Even though the search is fast (O(log n)), the insertion is slow (O(n)).
Lots of good discussion in this post! Arguing about timing is hard, so I wrote some timing script. It's quite rudimentary but I think it will do for now. I've attached the results too.
import timeit
import math
import matplotlib.pyplot as plt
from collections import defaultdict
setup = """
import bisect
import heapq
from random import randint
A = sorted((randint(1, 10000) for _ in range({})))
B = sorted((randint(1, 10000) for _ in range({})))
def bisect_sol(A, B):
for b in B:
bisect.insort(A, b)
def merge_sol(A, B):
ia = ib = 0
while ib < len(B):
if ia < len(A) and A[ia] < B[ib]:
if ia < len(A):
ia += 1
else:
A.insert(ia + 1, B[ib])
ib += 1
def heap_sol(A, B):
return heapq.merge(A, B)
def sorted_sol(A, B):
return sorted(A + B)
"""
sols = ["bisect", "merge", "heap", "sorted"]
times = defaultdict(list)
iters = [100, 1000, 2000, 5000, 10000, 20000, 50000, 100000]
for n in iters:
for sol in sols:
t = min(timeit.repeat(stmt="{}_sol(A, B)".format(sol), setup=setup.format(n, n), number=1, repeat=5))
print("({}, {}) done".format(n, sol))
times[sol].append(math.log(t))
for sol in sols:
plt.plot(iters, times[sol])
plt.xlabel("iterations")
plt.ylabel("log time")
plt.legend(sols)
plt.show()
This is the result:
It's clear that modifying the list is the major bottleneck, so creating a new list is the way to go.
Here is a solution in O(n)
:
A = [10, 20, 30, 40]
B = [20, 60, 81, 90]
C = []
i = j = 0
while i < len(A) and j < len(B):
if A[i] < B[j]:
C.append(A[i])
i += 1
else:
C.append(B[j])
j += 1
C += A[i:] + B[j:]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With