Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I use numpy (or pylab) as a python environment by using `from numpy import *`?

I use pylab (more specifically numpy) in all of my python programs­. The exceptions are very rare, if any. So far, I have taken the habit of importing numpy in the following way:

from numpy import *

This has the advantage of making it look like numpy was part of python from the beginning. Is there something bad in the fact of importing numpy like this in every script? I mean apart from the fact that every script/program will require a little bit more memory and will take longer to load.

I think always having to write numpy or even np before every function call that comes from numpy (e.g., np.zeros(3)) is tedious because it requires me to know which function comes from numpy and which doesn't. I don't really care that the zeros function comes from numpy or python, I just want/need to use it.

Which notation is better according to you?

like image 602
levesque Avatar asked Apr 21 '11 19:04

levesque


2 Answers

Just to elaborate on what other people have said, numpy is an especially bad module to use import * with.

pylab is meant for interactive use, and it's fine there. No one wants to type pylab.zeros over and over in a shell when they could just type zeros. However, as soon as you start writing code, everything changes. You're typing it once and it's staying around potentially forever, and other people (e.g. yourself a year down the road) are probably going to be trying to figure out what the heck you were doing.

In addition to what @unutbu already said about overriding python's builtin sum, float int, etc, and to what everyone has said about not knowing where a function came from, numpy and pylab are very large namespaces.

numpy has 566 functions, variables, classes, etc within its namespace. That's a lot! pylab has 930! (And with pylab, these come from quite a few different modules.)

Sure, it's easy enough to guess where zeros or ones or array is from, but what about source or DataSource or lib.utils? (all of these will be in your local namespace if you do from numpy import *

If you have a even slightly larger project, there's a good chance you're going to have a local variable or a variable in another file that's named similar to something in a big module like numpy. Suddenly, you start to care a lot more about exactly what it is that you're calling!

As another example, how would you distinguish between pylab's fft function and numpy's fft module?

Depending on whether you do

from numpy import *
from pylab import *

or:

from pylab import *
from numpy import *

fft is a completely different thing with completely different behavior! (i.e. trying to call fft in the second case will raise an error.)

All in all, you should always avoid from module import *, but it's an especially bad idea in the case of numpy, scipy, et. al. because they're such large namespaces.

Of course all that having been said, if you're just futzing around in a shell trying to quickly get a plot of some data before move on to actually doing something with it, then sure, use pylab. That's what it's there for. Just don't write something that way that anyone might try to read later on down the road!

</rant>

like image 158
Joe Kington Avatar answered Nov 15 '22 14:11

Joe Kington


  1. Using from numpy import * changes the behavior of any, all and sum. For example,

    any([[False]])
    # True
    all([[True, False], [False, False]])
    # True
    sum([[1,2],[3,4]], 1) 
    # TypeError: unsupported operand type(s) for +: 'int' and 'list'
    

    Whereas, if you use from numpy import * then values are completely different:

    from numpy import *
    any([[False]])
    # False
    all([[True, False], [False, False]])
    # False
    sum([[1,2],[3,4]], 1) 
    array([3, 7])
    

    The full set of name collisions can be found this way (thanks to @Joe Kington and @jolvi for pointing this out):

    import numpy as np
    np_locals = set(np.__all__)
    builtins = set(dir(__builtins__))
    print([name for name in np_locals.intersection(builtins) if not name.startswith('__')])
    # ['any', 'all', 'sum']
    
  2. This can lead to very confusing bugs since someone testing or using your code in a Python interpreter without from numpy import * may see completely different behavior than you do.

  3. Using multiple imports of the form from module import * can compound the problem with even more collisions of this sort. If you nip this bad habit in the bud, you'll never have to worry about this (potentially confounding) bug.

    The order of the imports could also matter if both modules redefine the same name.

    And it makes it harding to figure out where functions and values come from.

  4. While it is possible to use from numpy import * and still access Python's builtins, it is awkward:

    from numpy import *
    any([[False]])
    __builtins__.any([[False]])
    

    and less readable than:

    import numpy as np
    np.any([[False]])
    any([[False]])
    
  5. As the Zen of Python says,

    Namespaces are a honking great idea -- let's use more of those!

My advice would be to never use from module import * in any script, period.

like image 22
unutbu Avatar answered Nov 15 '22 14:11

unutbu