I'm working on a python project that runs for a couple hours before it's finished all it's calculations. I would like to hold the top 10 results of the calculation as it progresses.
There's the obvious way of doing:
if calc > highest_calc:
second_calc = highest_calc
highest_calc = calc
if calc < highest_calc and calc > second_calc:
third_calc = second_calc
second_calc = calc
if calc < second_calc and calc > third_calc:
fourth_calc = third_calc
third_calc = calc
etc.
But is there a better, more dynamic and pythonic way?
Bonus
For my project, each calcultion has three coresponding names with it: name_a
name_b
name_c
. What I don't want is more then one of the top 10 values to have the same three names. But, if the last calc
has the same names, I want to keep the highest of the two. What's the best way to do this?
For example, lets say 2.3
is the value of calc
, using MCD
SBUX
and CAT
to calculate calc
. But what if I had already made a calc
using MCD
SBUX
and CAT
and it made it to the top to? How do I find the value of this calc
so I can see if it is less than or greater then the new calc
. If it is greater than, remove the old calc with the same and add the new calc
. If it is less than, pass
new calc. Hopefully that makes sense:
If name_a in top10 and name_b in top10 and name_c in top10:
if calc > old_calc_with_same_names:
add_calc = calc, name_a, name_b, name_c
top10.insert(bisect.bisect(calc, top10[0]), add_calc)
else:
add to top10
Finished Code
csc = []
top_ports = []
add_sharpe = [sharpe, name_a, weight_a, exchange_a, name_b, weight_b, exchange_b, name_c, weight_c, exchange_c]
if init__calc == 0:
csc.append(add_sharpe)
if init__calc > 1:
if name_a == prev_name_a and name_b == prev_name_b and name_c == prev_name_c:
csc.append(add_sharpe)
if name_a != prev_name_a or name_b != prev_name_b or name_c != prev_name_c:
if csc:
hs = max(csc, key=lambda x: x[0])
if top_ports:
ls = min(top_ports, key=lambda x: x[0])
if hs[0] > ls[0]:
hsi = csc.index(hs)
top_ports.append(csc[hsi])
else:
hsi = csc.index(hs)
top_ports.append(csc[hsi])
csc = []
csc.append(add_sharpe)
Later on in the script...
top_ports = sorted(top_ports, key=itemgetter(0), reverse=True)
print "The highest sharpe is: {0}".format(top_ports[0])
print " ==============================================="
print " ==============================================="
print datetime.now() - startTime
print "Second: {0}".format(top_ports[1])
print "Third: {0}".format(top_ports[2])
print "Fourth: {0}".format(top_ports[3])
print "Fifth: {0}".format(top_ports[4])
etc.
The benefit of this practice is eliminating the risk of accidentally interfering with either of these other use cases. Use the Python list * operator to create simple lists and nested lists as well: Sometimes we need to search through a collection. Let’s look at two options: lists and sets.
Python offers a bunch of tools and techniques you can use to approach this problem. However, Python’s Counter from collections provides a clean, efficient, and Pythonic solution. This dictionary subclass provides efficient counting capabilities out of the box.
Method #1 : Using sorted () + lambda. The combination of above functionality can be used to perform this particular task. In this, we just employ sorted function with reverse flag true, and print the top N elements using list slicing. test_list = [ ('Manjeet', 10), ('Akshat', 4), ('Akash', 2), ('Nikhil', 8)] N = 2.
(Once the project is in production, always make sure that you remove the commented code.) Run tests/add tests. Make sure that you have at least 80% test coverage. Use pytest or similar Python packages to track test coverage. Use Pylint /Vulture. Always consider running some type of linter over the code to see how “healthy” it is. Try to look for:
Use the heapq
module. Instead of needlessly storing all results, at every step it adds the new result and then efficiently removes the lowest—which may be the one just added—effectively keeping the top 10. Storing all results is not necessarily bad though; it can be valuable to collect statistics, and make it easier to determine what to keep afterwards.
from heapq import heappush, heappushpop
heap = []
for x in [18, 85, 36, 57, 2, 45, 55, 1, 28, 73, 95, 38, 89, 15, 7, 61]:
calculation_result = x + 1 # Dummy calculation
if len(heap) < 10:
heappush(heap, calculation_result)
else:
heappushpop(heap, calculation_result)
top10 = sorted(heap, reverse=True) # [96, 90, 86, 74, 62, 58, 56, 46, 39, 37]
Note that this module has more useful functions to only request the highest/lowest value, et cetera. This may help you to add the behavior concerning names.
Actually this construct is so common that it is available as heapq.nlargest
. However, to not store all your results after all, you'd have to model the calculator as a generator, which is a bit more advanced.
from heapq import nlargest
def calculate_gen():
for x in [18, 85, 36, 57, 2, 45, 55, 1, 28, 73, 95, 38, 89, 15, 7, 61]:
yield x + 1 # Dummy calculation
top10 = nlargest(10, calculate_gen()) # [96, 90, 86, 74, 62, 58, 56, 46, 39, 37]
Bonus
Here is some idea to make the results unique for each combination of associated names.
Using a heap is not going to cut it anymore, because a heap is not good at locating any item that is not the absolute minimum/maximum, and what we are interested in here is some kind of local minimum given the criteria of a name combination.
Instead, you can use a dict
to keep the highest value for each name combination. First you need to encode the name combination as an immutable value for it to work as a key, and because the order of the names shouldn't matter, decide on some order and stick with it. I'm going with alphabetical strings to keep it simple.
In the code below, each result is put in the dict
at a place that is unique for its name combination—therefore normalization might be needed—as long as there isn't a better result already. Later the top n is compiled from the highest results for each combination.
from heapq import nlargest
calculations = [('ABC', 18), ('CDE', 85), ('BAC', 36), ('CDE', 57),
('ECD', 2), ('BAD', 45), ('EFG', 55), ('DCE', 1)]
highest_per_name_combi = dict()
for name_combi, value in calculations:
normal_name_combi = ''.join(sorted(name_combi)) # Slow solution
current = highest_per_name_combi.get(normal_name_combi, float('-inf'))
highest_per_name_combi[normal_name_combi] = max(value, current)
top3 = nlargest(3, highest_per_name_combi.iteritems(), key=lambda x: x[1])
The only problem with this approach might be the amount of memory used. Since with 150 names there can be 551300 (150 choose 3) combinations, you may have to decide to clean up the dict
every now and then, which is simple. In the loop, check for the size of the dict
and if it exceeds some (still large) number, compose the current top n and create a new, minimal dict
from it. Also, some micro optimizations could be applied by reducing the number of lookups/calls, e.g. not using get
and/or max
.
All of this would be a lot easier if you'd have control over the order in which calculations are performed. If you'd know that the next 1000 calculations are all for the same name combination, you could just find the best of those first before adding it to the overall results.
Also, with a truly massive amount of results, the simplest way may actually be the best. Just write them to a file in a convenient format, sort them there (first by name combination, then reversely by value), take only the first occurrence for each name combination (easy when they are grouped) and sort the result again, just by value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With