I need to use a big data structure, more specifically, a big dictionary to do the looking up job. At the very first my code is like this: <pre class="prettyprint"><code>#build the dictionary blablabla #look up some information in the ditionary blablabla </code></pre> As I need to look up many times, I begin to realize that it is a good idea to implement it as a function, say lookup(info). Then here comes the problem, how should I deal with the big dictionary? Should I use lookup(info, dictionary) to pass it as an argument, or should I just initialize the dictionary in main() and just use it as an global variable? The first one seems more elegant because I think maintaining global variable is troublesome. But on the other hand, I'm not sure of the efficiency of passing a big dictionary to a function. It will be called many times and it will certainly be a nightmare if the argument passing is inefficient. Thanks. Edit1: I just made an experiment of the above two ways: Here's the snippet of the codes. lookup1 implements the argument passing looking up while lookup2 use global data structure "big_dict". <pre class="prettyprint"><code>class CityDict(): def __init__(): self.code_dict = get_code_dict() def get_city(city): try: return self.code_dict[city] except Exception: return None def get_code_dict(): # initiate code dictionary from file return code_dict def lookup1(city, city_code_dict): try: return city_code_dict[city] except Exception: return None def lookup2(city): try: return big_dict[city] except Exception: return None t = time.time() d = get_code_dict() for i in range(0, 1000000): lookup1(random.randint(0, 10000), d) print "lookup1 is %f" % (time.time() - t) t = time.time() big_dict = get_code_dict() for i in range(0, 1000000): lookup2(random.randint(0, 1000)) print "lookup2 is %f" % (time.time() - t) t = time.time() cd = CityDict() for i in range(0, 1000000): cd.get_city(str(i)) print "class is %f" % (time.time() - t) </code></pre> This is the output: <blockquote> lookup1 is 8.410885 lookup2 is 8.157661 class is 4.525721 </blockquote> So it seems that the two ways are almost the same, and yes, the global variable method is a little bit more efficient. Edit2: Added the class version suggested by Amber, and then test the efficiency again. Then we could see from tthe results that Amber is right, we should use the class version.

Answering the core question, parameter passing is not inefficient, it's not like your values will get copied around. Python passed references around, which is not to say that the way parameters are passed fits the well-known schemes of "pass-by-value" or "pass-by-reference". It's best imagined as initializing the value of a variable local to the called function with a reference value provided by the caller, which are passed by value. Still, the suggestion to use a class is probably a good idea.

The efficiency when using a big data structure in a function in Python

Tags:

python

pass-by-reference

function-call

I need to use a big data structure, more specifically, a big dictionary to do the looking up job.

At the very first my code is like this:

#build the dictionary
blablabla
#look up some information in the ditionary
blablabla

As I need to look up many times, I begin to realize that it is a good idea to implement it as a function, say lookup(info).

Then here comes the problem, how should I deal with the big dictionary?

Should I use lookup(info, dictionary) to pass it as an argument, or should I just initialize the dictionary in main() and just use it as an global variable?

The first one seems more elegant because I think maintaining global variable is troublesome. But on the other hand, I'm not sure of the efficiency of passing a big dictionary to a function. It will be called many times and it will certainly be a nightmare if the argument passing is inefficient.

Thanks.

Edit1:

I just made an experiment of the above two ways:

Here's the snippet of the codes. lookup1 implements the argument passing looking up while lookup2 use global data structure "big_dict".

class CityDict():
    def __init__():
        self.code_dict = get_code_dict()
    def get_city(city):
        try:
            return self.code_dict[city]
        except Exception:
            return None         

def get_code_dict():
    # initiate code dictionary from file
    return code_dict

def lookup1(city, city_code_dict):
    try:
        return city_code_dict[city]
    except Exception:
        return None

def lookup2(city):
    try:
        return big_dict[city]
    except Exception:
        return None


t = time.time()
d = get_code_dict()
for i in range(0, 1000000):
    lookup1(random.randint(0, 10000), d)

print "lookup1 is %f" % (time.time() - t)


t = time.time()
big_dict = get_code_dict()
for i in range(0, 1000000):
    lookup2(random.randint(0, 1000))
print "lookup2 is %f" % (time.time() - t)


t = time.time()
cd = CityDict() 
for i in range(0, 1000000):
    cd.get_city(str(i))
print "class is %f" % (time.time() - t)

This is the output:

lookup1 is 8.410885
lookup2 is 8.157661
class is 4.525721

So it seems that the two ways are almost the same, and yes, the global variable method is a little bit more efficient.

Edit2:

Added the class version suggested by Amber, and then test the efficiency again. Then we could see from tthe results that Amber is right, we should use the class version.

342

asked Jan 12 '11 04:01

ibread

1 Answers

Answering the core question, parameter passing is not inefficient, it's not like your values will get copied around. Python passed references around, which is not to say that the way parameters are passed fits the well-known schemes of "pass-by-value" or "pass-by-reference".

It's best imagined as initializing the value of a variable local to the called function with a reference value provided by the caller, which are passed by value.

Still, the suggestion to use a class is probably a good idea.

answered Oct 12 '22 23:10

Jim Brissom

Related questions
                            
                                Authentication in Facebook Canvas App using New Graph API
                            
                                Python: determining if an object is file-like? [duplicate]
                            
                                how join list tuple and dict into a dict?
                            
                                How to use named parameters in Python methods that are defaulting to a class level value?
                            
                                Parsing 'time string' with Python?
                            
                                How does django one-to-one relationships map the name to the child object?
                            
                                Python: safe to read values from an object in a thread?
                            
                                Translate matlab to python/numpy [closed]
                            
                                Python: How to catch this kind of exception?
                            
                                python : using gettext everywhere with __init__.py
                            
                                Linux: Pipe into Python (ncurses) script, stdin and termios
                            
                                Getting a Python function to cleanly return a scalar or list, depending on number of arguments
                            
                                Django multidb: write to multiple databases
                            
                                Optimizing python for loops
                            
                                How are Google App Engine model classes stored?
                            
                                Unique list of dicts based on keys
                            
                                Python: Call all methods of an object with a given set of arguments
                            
                                reimporting a single function in python
                            
                                RDFLib: Namespace prefixes in XML serialization
                            
                                In Python, without using the /proc filesystem, how do I tell if a given PID is running?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With