How can retrieve the two highest item from a list containing 100,000 integers without having to sort the entire list first?
Use heapq.nlargest
. This is the most flexible approach in case you ever want to handle more than just the top two elements.
Here's an example.
>>> import heapq >>> import random >>> x = range(100000) >>> random.shuffle(x) >>> heapq.nlargest(2, x) [99999, 99998]
JacobM's answer is absolutely the way to go. However, there are a few things to keep in mind while implementing what he described. Here's a little play-along-at-home tutorial to guide you through the trickier parts of solving this problem.
If this code is meant for production use, please use one of the more efficient/concise answers listed. This answer is targetted at someone new to programming.
The idea is simple.
largest
and second_largest
.largest
, assign it to largest
.second_largest
, but less than largest
, assign it to second_largest
.Let's start.
def two_largest(inlist): """Return the two largest items in the sequence. The sequence must contain at least two items.""" for item in inlist: if item > largest: largest = item elif largest > item > second_largest: second_largest = item # Return the results as a tuple return largest, second_largest # If we run this script, it will should find the two largest items and # print those if __name__ == "__main__": inlist = [3, 2, 1] print two_largest(inlist)
Okay, we now have JacobM's answer as a Python function. What happens when we try to run it?
Traceback (most recent call last): File "twol.py", line 10, in <module> print two_largest(inlist) File "twol.py", line 3, in two_largest if item > largest: UnboundLocalError: local variable 'largest' referenced before assignment
Apparently, we need to set largest
before we start the loop. This probably means we should set second_largest
too.
Let's set largest
and second_largest
to 0.
def two_largest(inlist): """Return the two largest items in the sequence. The sequence must contain at least two items.""" largest = 0 # NEW! second_largest = 0 # NEW! for item in inlist: if item > largest: largest = item elif largest > item > second_largest: second_largest = item # Return the results as a tuple return largest, second_largest # If we run this script, it will should find the two largest items and # print those if __name__ == "__main__": inlist = [3, 2, 1] print two_largest(inlist)
Good. Let's run it.
(3, 2)
Great! Now let's test with inlist
being [1, 2, 3]
inlist = [1, 2, 3] # CHANGED!
Let's try it.
(3, 0)
...Uh oh.
The largest value (3) seems correct. The second-largest value is completely wrong though. What's going on?
Let's work through what the function is doing.
largest
is 0 and second_largest
is also 0.largest
becomes 1.largest
becomes 2.But what about second_largest
?
When we assign a new value to largest
, the largest value actually becomes second-largest. We need to show that in the code.
def two_largest(inlist): """Return the two largest items in the sequence. The sequence must contain at least two items.""" largest = 0 second_largest = 0 for item in inlist: if item > largest: second_largest = largest # NEW! largest = item elif largest > item > second_largest: second_largest = item # Return the results as a tuple return largest, second_largest # If we run this script, it will should find the two largest items and # print those if __name__ == "__main__": inlist = [1, 2, 3] print two_largest(inlist)
Let's run it.
(3, 2)
Fantastic.
Now let's try it with a list of negative numbers.
inlist = [-1, -2, -3] # CHANGED!
Let's run it.
(0, 0)
That's not right at all. Where did these zeroes come from?
It turns out that the starting values for largest
and second_largest
were actually larger than all the items in the list. The first thing you might consider is setting largest
and second_largest
to the lowest values possible in Python. Unfortunately, Python doesn't have a smallest possible value. That means that, even if you set both of them to -1,000,000,000,000,000,000, you can have a list of values smaller than that.
So what's the best thing to do? Let's try setting largest
and second_largest
to the first and second items in the list. Then, to avoid double-counting any items in the list, we only look at the part of the list after the second item.
def two_largest(inlist): """Return the two largest items in the sequence. The sequence must contain at least two items.""" largest = inlist[0] # CHANGED! second_largest = inlist[1] # CHANGED! # Only look at the part of inlist starting with item 2 for item in inlist[2:]: # CHANGED! if item > largest: second_largest = largest largest = item elif largest > item > second_largest: second_largest = item # Return the results as a tuple return largest, second_largest # If we run this script, it will should find the two largest items and # print those if __name__ == "__main__": inlist = [-1, -2, -3] print two_largest(inlist)
Let's run it.
(-1, -2)
Great! Let's try with another list of negative numbers.
inlist = [-3, -2, -1] # CHANGED!
Let's run it.
(-1, -3)
Wait, what?
Let's step through our logic again.
largest
is set to -3second_largest
is set to -2Wait right there. Already, this seems wrong. -2 is larger than -3. Is this what caused the problem? Let's continue.
largest
is set to -1; second_largest
is set to the old value of largest
, which is -3Yes, that looks to be the problem. We need to ensure that largest
and second_largest
are set correctly.
def two_largest(inlist): """Return the two largest items in the sequence. The sequence must contain at least two items.""" if inlist[0] > inlist[1]: # NEW largest = inlist[0] second_largest = inlist[1] else: # NEW largest = inlist[1] # NEW second_largest = inlist[0] # NEW # Only look at the part of inlist starting with item 2 for item in inlist[2:]: if item > largest: second_largest = largest largest = item elif largest > item > second_largest: second_largest = item # Return the results as a tuple return largest, second_largest # If we run this script, it will should find the two largest items and # print those if __name__ == "__main__": inlist = [-3, -2, -1] print two_largest(inlist)
Let's run it.
(-1, -2)
Excellent.
So here's the code, nicely commented and formatted. It's also had all the bugs I could find beaten from it. Enjoy.
However, assuming this really is a homework question, I hope you get some useful experience from seeing an imperfect piece of code slowly improved. I hope some of these techniques will be useful in future programming assignments.
Not very efficient. But for most purposes, it should be okay: on my computer (Core 2 Duo), a list of 100 000 items can be processed in 0.27 seconds (using timeit
, averaged over 100 runs).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With