I look up online and know that list.pop()
has O(1) time complexity but list.pop(i)
has O(n) time complexity. While I am writing leetcode, many people use pop(i)
in a for loop and they say it is O(n) time complexity and in fact it is faster than my code, which only uses one loop but many lines in that loop. I wonder why this would happen, and should I use pop(i)
instead of many lines to avoid it?
Example: Leetcode 26. Remove Duplicates from Sorted Array
My code: (faster than 75%)
class Solution(object):
def removeDuplicates(self, nums):
"""
:type nums: List[int]
:rtype: int
"""
left, right = 0, 0
count = 1
while right < len(nums)-1:
if nums[right] == nums[right+1]:
right += 1
else:
nums[left+1]=nums[right+1]
left += 1
right += 1
count += 1
return count
and other people's code, faster than 90%: (this guy does not say O(n), but why O(n^2) faster than my O(n)?)
https://leetcode.com/problems/remove-duplicates-from-sorted-array/discuss/477370/python-3%3A-straight-forward-6-lines-solution-90-faster-100-less-memory
My optimized code (faster than 89%)
class Solution(object):
def removeDuplicates(self, nums):
"""
:type nums: List[int]
:rtype: int
"""
left, right = 0, 0
while right < len(nums)-1:
if nums[right] != nums[right+1]:
nums[left+1]=nums[right+1]
left += 1
right += 1
return left + 1
Your algorithm genuinely does take O(n) time and the "pop in reverse order" algorithm genuinely does take O(n²) time. However, LeetCode isn't reporting that your time complexity is better than 89% of submissions; it is reporting your actual running time is better than 89% of all submissions. The actual running time depends on what inputs the algorithm is tested with; not just the sizes but also the number of duplicates.
It also depends how the running times across multiple test cases are averaged; if most of the test cases are for small inputs where the quadratic solution is faster, then the quadratic solution may come out ahead overall even though its time complexity is higher. @Heap Overflow also points out in the comments that the overhead time of LeetCode's judging system is proportionally large and quite variable compared to the time it takes for the algorithms to run, so the discrepancy could simply be due to random variation in that overhead.
To shed some light on this, I measured running times using timeit. The graph below shows my results; the shapes are exactly what you'd expect given the time complexities, and the crossover point is somewhere between 8000 < n < 9000
on my machine. This is based on sorted lists where each distinct element appears on average twice. The code I used to generate the times is given below.
Timing code:
def linear_solution(nums):
left, right = 0, 0
while right < len(nums)-1:
if nums[right] != nums[right+1]:
nums[left+1]=nums[right+1]
left += 1
right += 1
return left + 1
def quadratic_solution(nums):
prev_obj = []
for i in range(len(nums)-1,-1,-1):
if prev_obj == nums[i]:
nums.pop(i)
prev_obj = nums[i]
return len(nums)
from random import randint
from timeit import timeit
def gen_list(n):
max_n = n // 2
return sorted(randint(0, max_n) for i in range(n))
# I used a step size of 1000 up to 15000, then a step size of 5000 up to 50000
step = 1000
max_n = 15000
reps = 100
print('n', 'linear time (ms)', 'quadratic time (ms)', sep='\t')
for n in range(step, max_n+1, step):
# generate input lists
lsts1 = [ gen_list(n) for i in range(reps) ]
# copy the lists by value, since the algorithms will mutate them
lsts2 = [ list(g) for g in lsts1 ]
# use iterators to supply the input lists one-by-one to timeit
iter1 = iter(lsts1)
iter2 = iter(lsts2)
t1 = timeit(lambda: linear_solution(next(iter1)), number=reps)
t2 = timeit(lambda: quadratic_solution(next(iter2)), number=reps)
# timeit reports the total time in seconds across all reps
print(n, 1000*t1/reps, 1000*t2/reps, sep='\t')
The conclusion is that your algorithm is indeed faster than the quadratic solution for large enough inputs, but the inputs LeetCode is using to measure running times are not "large enough" to overcome the variation in the judging overhead, and the fact that the average includes times measured on smaller inputs where the quadratic algorithm is faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With