I am using pandas 0.14.1 on a webserver to process reports from a SQL database.
I do not need any plotting facilities, but matplotlib is always imported.
How can I import only the modules that I need to do the following?
df = pd.io.sql.frame_query(query, con=conn)
df['colname'].apply(somefunc)
df.set_index('colname')
print df.to_html()
I am constantly having to add the following hack to all of my report generating scripts:
import os
os.environ['MPLCONFIGDIR'] = '/tmp/'
Before I import pandas. What can I do to avoid this?
Here's my webserver error log when I omit this hack:
File "/var/www/scripts/myscript.py", line 46, in index\n from pandas.io import sql
File "/usr/lib/python2.7/dist-packages/pandas/__init__.py", line 41, in <module>\n from pandas.core.api import *
File "/usr/lib/python2.7/dist-packages/pandas/core/api.py", line 9, in <module>\n from pandas.core.groupby import Grouper
File "/usr/lib/python2.7/dist-packages/pandas/core/groupby.py", line 15, in <module>\n from pandas.core.frame import DataFrame
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 38, in <module>\n from pandas.core.series import Series
File "/usr/lib/python2.7/dist-packages/pandas/core/series.py", line 2524, in <module>\n import pandas.tools.plotting as _gfx
File "/usr/lib/python2.7/dist-packages/pandas/tools/plotting.py", line 26, in <module>\n import pandas.tseries.converter as conv
File "/usr/lib/python2.7/dist-packages/pandas/tseries/converter.py", line 7, in <module>\n import matplotlib.units as units
File "/usr/lib/pymodules/python2.7/matplotlib/__init__.py", line 774, in <module>\n rcParams = rc_params()
File "/usr/lib/pymodules/python2.7/matplotlib/__init__.py", line 692, in rc_params\n fname = matplotlib_fname()
File "/usr/lib/pymodules/python2.7/matplotlib/__init__.py", line 604, in matplotlib_fname\n fname = os.path.join(get_configdir(), 'matplotlibrc')
File "/usr/lib/pymodules/python2.7/matplotlib/__init__.py", line 253, in wrapper\n ret = func(*args, **kwargs)
File "/usr/lib/pymodules/python2.7/matplotlib/__init__.py", line 478, in _get_configdir\n raise RuntimeError("Failed to create %s/.matplotlib; consider setting MPLCONFIGDIR to a writable directory for matplotlib configuration data"%h)
RuntimeError: Failed to create /var/www/.matplotlib; consider setting MPLCONFIGDIR to a writable directory for matplotlib configuration data
Further detail: Platform is Ubuntu 12.04LTS which has a fairly old version of matplotlib. Recent versions fix this error by creating a temp file. However it still sucks that matplotlib is running in my webserver when I don't need it.
The easiest way to install pandas is to install it as part of the Anaconda distribution, a cross platform distribution for data analysis and scientific computing. This is the recommended installation method for most users.
There are various ways to install the Python Pandas module. One of the easiest ways is to install using Python package installer i.e. PIP. In order to add the Pandas and NumPy module to your code, we need to import these modules in our code.
The import pandas portion of the code tells Python to bring the pandas data analysis library into your current environment. The as pd portion of the code then tells Python to give pandas the alias of pd. This allows you to use pandas functions by simply typing pd. function_name rather than pandas.
Remember, you'll need to import pandas every time you run a script or start up a new jupyter notebook.
Unfortunately the answer is upgrade matplotlib to a version that creates a writable configuration directory on startup if the default locations are not available. This is a pain if you're using your linux distribution packages (matplotlib v1.1.1) Versions after 1.3.1 should be fine.
Both suggestions in the comments do not fix the problem.
Changing the mpl config to use a different display driver like Agg
does not stop matplotlib from trying to create a config directory.
Adding an empty matplotlib.py file breaks pandas because it requires the matplotlib.units module for datatype conversion.
So to avoid this until upgrading matplotlib, the os.environ['MPLCONFIGDIR'] = '/tmp/'
hack works fine, but we have to remember to put it in every file that uses pandas on our webserver. (or create our own custom module that hides all of this)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With