Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort the top ten results

I am getting a list in which I am saving the results in the following way

City Percentage
Mumbai  98.30
London 23.23
Agra    12.22
.....

List structure is [["Mumbai",98.30],["London",23.23]..]

I am saving this records in form of a list.I need the list to be sort top_ten records.Even if I get cities also, it would be fine.

I am trying to use the following logic, but it fails for to provide accurate data

if (condition):
    if b not in top_ten:
        top_ten.append(b)   
        top_ten.remove(tmp)

Any other solution,approach is also welcome.

EDIT 1

for a in sc_percentage:
            print a

List I am getting

(<ServiceCenter: DELHI-DLC>, 100.0)
(<ServiceCenter: DELHI-DLE>, 75.0)
(<ServiceCenter: DELHI-DLN>, 90.909090909090907)
(<ServiceCenter: DELHI-DLS>, 83.333333333333343)
(<ServiceCenter: DELHI-DLW>, 92.307692307692307)
like image 347
onkar Avatar asked Dec 07 '22 07:12

onkar


1 Answers

If the list is fairly short then as others have suggested you can sort it and slice it. If the list is very large then you may be better using heapq.nlargest():

>>> import heapq
>>> lis = [['Mumbai', 98.3], ['London', 23.23], ['Agra', 12.22]]
>>> heapq.nlargest(2, lis, key=lambda x:x[1])
[['Mumbai', 98.3], ['London', 23.23]]

The difference is that nlargest only makes a single pass through the list and in fact if you are reading from a file or other generated source need not all be in memory at the same time.

You might also be interested to look at the source for nlargest() as it works in much the same way that you were trying to solve the problem: it keeps only the desired number of elements in a data structure known as a heap and each new value is pushed into the heap then the smallest value is popped from the heap.

Edit to show comparative timing:

>>> import random
>>> records = []
>>> for i in range(100000):
    value = random.random() * 100
    records.append(('city {:2.4f}'.format(value), value))


>>> import heapq
>>> heapq.nlargest(10, records, key=lambda x:x[1])
[('city 99.9995', 99.99948904248298), ('city 99.9974', 99.99738898315216), ('city 99.9964', 99.99642759230214), ('city 99.9935', 99.99345173704319), ('city 99.9916', 99.99162694442714), ('city 99.9908', 99.99075084123544), ('city 99.9887', 99.98865134685201), ('city 99.9879', 99.98792632193258), ('city 99.9872', 99.98724339718686), ('city 99.9854', 99.98540548350132)]
>>> timeit.timeit('sorted(records, key=lambda x:x[1])[:10]', setup='from __main__ import records', number=10)
1.388942152229788
>>> timeit.timeit('heapq.nlargest(10, records, key=lambda x:x[1])', setup='import heapq;from __main__ import records', number=10)
0.5476185073315492

On my system getting the top 10 from 100 records is fastest by sorting and slicing, but with 1,000 or more records it is faster to use nlargest.

like image 139
Duncan Avatar answered Dec 20 '22 23:12

Duncan