Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I import submodules of pandas without importing matplotlib?

I am using pandas 0.14.1 on a webserver to process reports from a SQL database.

I do not need any plotting facilities, but matplotlib is always imported.

How can I import only the modules that I need to do the following?

df = pd.io.sql.frame_query(query, con=conn)
df['colname'].apply(somefunc)
df.set_index('colname')
print df.to_html() 

I am constantly having to add the following hack to all of my report generating scripts:

import os
os.environ['MPLCONFIGDIR'] = '/tmp/'

Before I import pandas. What can I do to avoid this?

Here's my webserver error log when I omit this hack:

File "/var/www/scripts/myscript.py", line 46, in index\n    from pandas.io import sql
File "/usr/lib/python2.7/dist-packages/pandas/__init__.py", line 41, in <module>\n    from pandas.core.api import *
File "/usr/lib/python2.7/dist-packages/pandas/core/api.py", line 9, in <module>\n    from pandas.core.groupby import Grouper
File "/usr/lib/python2.7/dist-packages/pandas/core/groupby.py", line 15, in <module>\n    from pandas.core.frame import DataFrame
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 38, in <module>\n    from pandas.core.series import Series
File "/usr/lib/python2.7/dist-packages/pandas/core/series.py", line 2524, in <module>\n    import pandas.tools.plotting as _gfx
File "/usr/lib/python2.7/dist-packages/pandas/tools/plotting.py", line 26, in <module>\n    import pandas.tseries.converter as conv
File "/usr/lib/python2.7/dist-packages/pandas/tseries/converter.py", line 7, in <module>\n    import matplotlib.units as units
File "/usr/lib/pymodules/python2.7/matplotlib/__init__.py", line 774, in <module>\n    rcParams = rc_params()
File "/usr/lib/pymodules/python2.7/matplotlib/__init__.py", line 692, in rc_params\n    fname = matplotlib_fname()
File "/usr/lib/pymodules/python2.7/matplotlib/__init__.py", line 604, in matplotlib_fname\n    fname = os.path.join(get_configdir(), 'matplotlibrc')
File "/usr/lib/pymodules/python2.7/matplotlib/__init__.py", line 253, in wrapper\n    ret = func(*args, **kwargs)
File "/usr/lib/pymodules/python2.7/matplotlib/__init__.py", line 478, in _get_configdir\n    raise RuntimeError("Failed to create %s/.matplotlib; consider setting MPLCONFIGDIR to a writable directory for matplotlib configuration data"%h)
RuntimeError: Failed to create /var/www/.matplotlib; consider setting MPLCONFIGDIR to a writable directory for matplotlib configuration data

Further detail: Platform is Ubuntu 12.04LTS which has a fairly old version of matplotlib. Recent versions fix this error by creating a temp file. However it still sucks that matplotlib is running in my webserver when I don't need it.

like image 264
Sekenre Avatar asked Nov 19 '14 11:11

Sekenre


People also ask

Which is the best way to import pandas module?

The easiest way to install pandas is to install it as part of the Anaconda distribution, a cross platform distribution for data analysis and scientific computing. This is the recommended installation method for most users.

How do I import pandas into a program?

There are various ways to install the Python Pandas module. One of the easiest ways is to install using Python package installer i.e. PIP. In order to add the Pandas and NumPy module to your code, we need to import these modules in our code.

What is the difference between import pandas and import pandas as PD?

The import pandas portion of the code tells Python to bring the pandas data analysis library into your current environment. The as pd portion of the code then tells Python to give pandas the alias of pd. This allows you to use pandas functions by simply typing pd. function_name rather than pandas.

Do you have to import pandas every time?

Remember, you'll need to import pandas every time you run a script or start up a new jupyter notebook.


1 Answers

Unfortunately the answer is upgrade matplotlib to a version that creates a writable configuration directory on startup if the default locations are not available. This is a pain if you're using your linux distribution packages (matplotlib v1.1.1) Versions after 1.3.1 should be fine.

Both suggestions in the comments do not fix the problem.

Changing the mpl config to use a different display driver like Agg does not stop matplotlib from trying to create a config directory.

Adding an empty matplotlib.py file breaks pandas because it requires the matplotlib.units module for datatype conversion.

So to avoid this until upgrading matplotlib, the os.environ['MPLCONFIGDIR'] = '/tmp/' hack works fine, but we have to remember to put it in every file that uses pandas on our webserver. (or create our own custom module that hides all of this)

like image 90
Sekenre Avatar answered Oct 20 '22 00:10

Sekenre