Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matplotlib Ignoring Width on Sampled DataFrames

I thought for sure this would already have an answer, but I can't find it anywhere. I'm running into an issue when trying to use matplotlib to make bar charts. Under most conditions, the plot comes out correctly. However, when I take some values out of the data before plotting the bars become much wider than I want. Consider the following minimum reproducible example:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()
ex1 = pd.DataFrame({'x':[330,342,344,352,354,371,388,394,401,412,414,448,462,502,504,522,622],
                    'y':[2,9,0,2,2,1,0,4,7,6,8,4,2,6,3,5,7],
                    'ind':[0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0]})
ax.bar(ex1.x,ex1.y,width=0.9)
fig.savefig('some/path')

When I open up this plot I get the following:

enter image description here

This looks great. No issues. But now, suppose I only want to create a bar chart for part of the data. Essentially, all of the leading 0's in the "ind" column of my DF contain rows I don't care to plot. So I get rid of those and try again:

fig, ax = plt.subplots()
firstrow = ex1[ex1.ind==np.max(ex1.ind)].index.to_list()[0]
ex1 = ex1[firstrow:]
ax.bar(ex1.x,ex1.y,width=0.9)
fig.savefig('some/other/location')

When I open that one up, I expect a truncated version of the original plot, i.e. with thin bars of the correct height, just without the few bars that I cut out of the DF. Instead, I get this:

enter image description here

Huh? It starts in the right place, but that's about all the good I can say for it. It appears as if it's just ignoring the width parameter and running all of the bars together. I've played with several things and done some searches and couldn't figure out either what's going wrong or how to fix it. Any suggestions on how to make the second figure look like the first but without the data I don't want would be much appreciated!

Edited to answer any questions: Results of print(ex1.x); print(exq.y) are:

print(ex1.x); print(ex1.y)
5     371
6     388
7     394
8     401
9     412
10    414
11    448
12    462
13    502
14    504
15    522
16    622
Name: x, dtype: int64
5     1
6     0
7     4
8     7
9     6
10    8
11    4
12    2
13    6
14    3
15    5
16    7
Name: y, dtype: int64
like image 661
cbw Avatar asked Nov 16 '25 11:11

cbw


2 Answers

While matplotlib tries to support direct plotting of pandas objects, it might sometimes be problematic if pandas changes some internals. The solution to such problems would always be to fall back to plotting numpy arrays, for which all functionality is well tested.

Here, the problem is that with some combinations of pandas/matplotlib versions plotting of non-zero indexed dataframes or series can cause hick-ups.

Hence you would want to plot the numpy arrays ex1.x.values and ex1.y.values instead of the pandas series ex1.x amd ex1.y:

ax.bar(ex1.x.values, ex1.y.values, width=0.9)
like image 114
ImportanceOfBeingErnest Avatar answered Nov 18 '25 23:11

ImportanceOfBeingErnest


I'm not completely sure what

ex1[ex1.ind==np.max(ex1.ind)].index.to_list()[0]

is doing since it throws an error for me, but using

ex1[ex1.ind==np.max(ex1.ind)].index.values[0]

instead gives

Spliced output with correct widths

Tested with Python 2.7 in Jupyter Notebook, Python 2.7 and Python 3.6 on Ubuntu - all gave the same output

like image 29
William Miller Avatar answered Nov 18 '25 23:11

William Miller



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!