Q. Write an algorithm that returns the second largest number in an array
a = [1, 2, 3, 4, 5]
print(max([x for x in a if x != max(a)]))
>> 4
I'm trying to figure out how this algorithm works and whether or not pythons internal magic will make this as efficient as writing a linear algorithm which just loops over the list a once and stores the highest and second highest values.
Correct me if I'm wrong:
The call to max(a) would be O(n)
[x for x in a] would also be O(n)
Would python be smart enough to cache the value of max(a) or would this mean that the list comprehension part the algorithm is O(n^2)?
And then the final max([listcomp]) would be another O(n), but this would only run once after the comprehension is finished, so the final algorithm would be O(n^2)?
Is there any fancy business going on internally that would cache the max(a) value and result in this algorithm working out quicker than O(n^2)?
The easy way to find out is timing it. Consider this timing code:
for i in range(1, 5):
a = list(range(10**i))
%timeit max([x for x in a if x != max(a)])
17.6 µs ± 178 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 698 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 61.6 ms ± 340 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 6.31 s ± 167 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Each time it multiplied the number of elements by 10 the runtime increased by 100. That's almost certainly O(n**2). For an O(n) algorithm the runtime would increase linearly with the number of elements:
for i in range(1, 6):
a = list(range(10**i))
max_ = max(a)
%timeit max([x for x in a if x != max_])
4.82 µs ± 27.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 29 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 262 µs ± 3.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.42 ms ± 13 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 24.9 ms ± 231 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
But I'm not certain the algorithm really does what is asked. Consider the list a=[1,3,3] even the heapq module tells me that the second largest element is 3 (not 1 - what your algorithm returns):
import heapq
>>> heapq.nlargest(2, [1,3,3])[0]
3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With