I was plotting a histogram with pandas and pyplot. For additional information, I added lines at certain percentiles of the histogram distribution. I already found out that you can make a axvline
appear with a certain % height of the whole chart:
cycle_df = pd.DataFrame(results)
plot = cycle_df.plot.hist(bins=30, label='Cycle time')
plot.axvline(np.percentile(cycle_df,5), label='5%', color='red', linestyle='dashed', linewidth=2, ymax=0.25)
plot.axvline(np.percentile(cycle_df,95), label='95%', color='blue', linestyle='dashed', linewidth=2, ymax=0.25)
Is it possible to let the red/blue lines end exactly where the histogram bar ends too to look smooth?
That's definitely possible but I'm not sure if it's easy to do with pandas.DataFrame.hist
because that doesn't return the histogram data. You would have to do another matplotlib.pyplot.hist
(or numpy.hist
) to get the actual bins and heights.
However if you use matplotlib
directly this would work:
import matplotlib.pyplot as plt
plt.style.use('ggplot')
import numpy as np
data = np.random.normal(550, 20, 100000)
fig, ax = plt.subplots(1, 1)
plot = ax.hist(data, bins=30, label='Cycle time', color='darkgrey')
ps = np.percentile(data, [5, 95])
_, ymax = ax.get_ybound()
# Search for the heights of the bins in which the percentiles are
heights = plot[0][np.searchsorted(plot[1], ps, side='left')-1]
# The height should be the bin-height divided by the y_bound (at least if y_min is zero)
ax.axvline(ps[0], label='5%', color='red', linestyle='dashed', linewidth=2, ymax=heights[0] / ymax)
ax.axvline(ps[1], label='95%', color='blue', linestyle='dashed', linewidth=2, ymax=heights[1] / ymax)
plt.legend()
In case you don't want to bother with calculating the relative height, you could also use Lines2D
from matplotlib.lines
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
plt.style.use('ggplot')
import numpy as np
data = np.random.normal(550, 20, 100000)
fig, ax = plt.subplots(1, 1)
plot = ax.hist(data, bins=30, label='Cycle time', color='darkgrey')
ps = np.percentile(data, [5, 95])
# Search for the heights of the bins in which the percentiles are
heights = plot[0][np.searchsorted(plot[1], ps, side='left')-1]
# The height should be the bin-height divided by the y_bound (at least if y_min is zero)
l1 = mlines.Line2D([ps[0], ps[0]], [0, heights[0]], label='5%', color='red', linestyle='dashed', linewidth=2)
l2 = mlines.Line2D([ps[1], ps[1]], [0, heights[1]], label='95%', color='blue', linestyle='dashed', linewidth=2)
ax.add_line(l1)
ax.add_line(l2)
plt.legend()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With