Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most Performant Way To Do Imports

Tags:

python

From a performance point of view (time or memory) is it better to do:

import pandas as pd

or

from pandas import DataFrame, TimeSeries

Does the best thing to depend on how many classes I'm importing from the package?

Similarly, I've seen people do things like:

def foo(bar):
    from numpy import array

Why would I ever want to do an import inside a function or method definition? Wouldn't this mean that import is being performed every time that the function is called? Or is this just to avoid namespace collisions?

like image 564
Batman Avatar asked Mar 10 '23 18:03

Batman


2 Answers

This is micro-optimising, and you should not worry about this.

Modules are loaded once per Python process. All code that then imports only need to bind a name to the module or objects defined in the module. That binding is extremely cheap.

Moreover, the top-level code in your module only runs once too, so the binding takes place just once. An import in a function does the binding each time the function is run, but again, this is so cheap as to be negligible.

Importing in a function makes a difference for two reasons: it won't put that name in the global namespace for the module (so no namespace pollution), and because the name is now local, using that name is slightly faster than using a global.

If you want to improve performance, focus on code that is being repeated many, many times. Importing is not it.

like image 115
Martijn Pieters Avatar answered Mar 20 '23 03:03

Martijn Pieters


Answering the more general question of when to import, imports are dependancies. It is code that may-or-may-not exist, that is required for the functioning of the program. It is therefore, a very good idea to import that code as soon as possible to prevent dumb errors from cropping up in the middle of execution.

This is particularly true as pypy becomes more popular, when the import might exist but isn't usable via pypy. Far better to fail early, than potentially hours into the execution of the code.

As for "import pandas as pd" vs "from pandas import DataFrame, TimeSeries", this question has multiple concerns (as all questions do), with some far more important than others. There's the question of namespace, there's the question of readability, and there's the question of performance. Performance, as Martjin states, should contribute to about 0.0001% of the decision. Readability should contribute about 90%. Namespace only 10%, as it can be mitigated so easily.

Personally, in my opinion, both import X as Y and form X import Y is bad practice, because explicit is better than implicit. You don't want to be on line 2000 trying to remember which package "calculate_mean" comes from because it isn't referenced anywhere else in the code. When i first started using numpy I was copy/pasting code from the internet, and couldn't figure out why i didn't/couldn't pip install np. This obviously isn't a problem if you have pre-existing knowledge that "np" is python for "numpy", but it's a stupid and pointless confusion for the 3 letters it saves. It came from numpy. Use numpy.

like image 22
J.J Avatar answered Mar 20 '23 01:03

J.J