I need to use a big data structure, more specifically, a big dictionary to do the looking up job.
At the very first my code is like this:
#build the dictionary
blablabla
#look up some information in the ditionary
blablabla
As I need to look up many times, I begin to realize that it is a good idea to implement it as a function, say lookup(info).
Then here comes the problem, how should I deal with the big dictionary?
Should I use lookup(info, dictionary) to pass it as an argument, or should I just initialize the dictionary in main() and just use it as an global variable?
The first one seems more elegant because I think maintaining global variable is troublesome. But on the other hand, I'm not sure of the efficiency of passing a big dictionary to a function. It will be called many times and it will certainly be a nightmare if the argument passing is inefficient.
Thanks.
Edit1:
I just made an experiment of the above two ways:
Here's the snippet of the codes. lookup1 implements the argument passing looking up while lookup2 use global data structure "big_dict".
class CityDict():
def __init__():
self.code_dict = get_code_dict()
def get_city(city):
try:
return self.code_dict[city]
except Exception:
return None
def get_code_dict():
# initiate code dictionary from file
return code_dict
def lookup1(city, city_code_dict):
try:
return city_code_dict[city]
except Exception:
return None
def lookup2(city):
try:
return big_dict[city]
except Exception:
return None
t = time.time()
d = get_code_dict()
for i in range(0, 1000000):
lookup1(random.randint(0, 10000), d)
print "lookup1 is %f" % (time.time() - t)
t = time.time()
big_dict = get_code_dict()
for i in range(0, 1000000):
lookup2(random.randint(0, 1000))
print "lookup2 is %f" % (time.time() - t)
t = time.time()
cd = CityDict()
for i in range(0, 1000000):
cd.get_city(str(i))
print "class is %f" % (time.time() - t)
This is the output:
lookup1 is 8.410885
lookup2 is 8.157661
class is 4.525721
So it seems that the two ways are almost the same, and yes, the global variable method is a little bit more efficient.
Edit2:
Added the class version suggested by Amber, and then test the efficiency again. Then we could see from tthe results that Amber is right, we should use the class version.
So the average case efficiency is 13 tries and we will have O ( (13+2)n + 2) or O (15n + 2) Below are some examples of functions in Python. Look at each and take note of the time efficiency.
Practice 1: Try Not To Blow Off Memory! A simple Python program may not cause many problems when it comes to memory, but memory utilization becomes critical on high memory consuming projects. It's always advisable to keep memory utilization in mind from the very beginning when working on a big project.
As discussed above, the data structures are the organizers and storers of data in an efficient manner so that they can be modified and accessed in the future. To suit different uses, there are different data structures in Python. These can be mainly classified into two types: 1.
As we can see from the graph above, a tuple is the only data structure that is not over-allocating memory. As tuple is immutable in nature, once created, it can not be changed, nor resized. There is simply no need for Python to over-allocate memory for it. In order to append records into a tuple, we can concatenate two tuples together with:
Answering the core question, parameter passing is not inefficient, it's not like your values will get copied around. Python passed references around, which is not to say that the way parameters are passed fits the well-known schemes of "pass-by-value" or "pass-by-reference".
It's best imagined as initializing the value of a variable local to the called function with a reference value provided by the caller, which are passed by value.
Still, the suggestion to use a class is probably a good idea.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With