bar
and hist
Given some data in a pandas.Series
, rv
, there is a difference between
Calling hist
directly on the data to plot
Calculating the histogram results (with numpy.histogram
) then plotting with bar
%matplotlib inline
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib
matplotlib.rcParams['figure.figsize'] = (12.0, 8.0)
matplotlib.style.use('ggplot')
# Setup size and distribution
size = 50000
distribution = stats.norm()
# Create random data
rv = pd.Series(distribution.rvs(size=size))
# Get sane start and end points of distribution
start = distribution.ppf(0.01)
end = distribution.ppf(0.99)
# Build PDF and turn into pandas Series
x = np.linspace(start, end, size)
y = distribution.pdf(x)
pdf = pd.Series(y, x)
# Get histogram of random data
y, x = np.histogram(rv, bins=50, normed=True)
# Correct bin edge placement
x = [(a+x[i+1])/2.0 for i,a in enumerate(x[0:-1])]
hist = pd.Series(y, x)
hist()
Plottingax = pdf.plot(lw=2, label='PDF', legend=True)
rv.plot(kind='hist', bins=50, normed=True, alpha=0.5, label='Random Samples', legend=True, ax=ax)
bar()
Plottingax = pdf.plot(lw=2, label='PDF', legend=True)
hist.plot(kind='bar', alpha=0.5, label='Random Samples', legend=True, ax=ax)
bar
plot be made to look like the hist
plot?The use case for this is needing to save only the histogrammed data to use and plot later (it is typically smaller in size than the original data).
Tweaking Matplotlib Preferably, one that has tick mark and other features closer to the aesthetic you want to achieve. Turn the frame and grid lines off. Tweak the x-axis so that there is a gap with the y-axis, which seems more appropriate for histograms. Have color options allowing for separation between bins.
rwidth : This parameter is an optional parameter and it is a relative width of the bars as a fraction of the bin width. log : This parameter is an optional parameter and it is used to set histogram axis to a log scale.
Obtaining a bar
plot that looks like the hist
plot requires some manipulating of default behavior for bar
.
bar
to use actual x data for plotting range by passing both x (hist.index
) and y (hist.values
). The default bar
behavior is to plot the y data against an arbitrary range and put the x data as the label.width
parameter to be related to actual step size of x data (The default is 0.8
)align
parameter to 'center'
.These changes need to be made via matplotlib
's bar()
called on the axis (ax
) instead of pandas
's bar()
called on the data (hist
).
%matplotlib inline
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib
matplotlib.rcParams['figure.figsize'] = (12.0, 8.0)
matplotlib.style.use('ggplot')
# Setup size and distribution
size = 50000
distribution = stats.norm()
# Create random data
rv = pd.Series(distribution.rvs(size=size))
# Get sane start and end points of distribution
start = distribution.ppf(0.01)
end = distribution.ppf(0.99)
# Build PDF and turn into pandas Series
x = np.linspace(start, end, size)
y = distribution.pdf(x)
pdf = pd.Series(y, x)
# Get histogram of random data
y, x = np.histogram(rv, bins=50, normed=True)
# Correct bin edge placement
x = [(a+x[i+1])/2.0 for i,a in enumerate(x[0:-1])]
hist = pd.Series(y, x)
# Plot previously histogrammed data
ax = pdf.plot(lw=2, label='PDF', legend=True)
w = abs(hist.index[1]) - abs(hist.index[0])
ax.bar(hist.index, hist.values, width=w, alpha=0.5, align='center')
ax.legend(['PDF', 'Random Samples'])
Another, simpler solution is to create fake samples that reproduce the same histogram and then simply use hist().
I.e., after retrieving bins
and counts
from stored data, do
fake = np.array([])
for i in range(len(counts)):
a, b = bins[i], bins[i+1]
sample = a + (b-a)*np.random.rand(counts[i])
fake = np.append(fake, sample)
plt.hist(fake, bins=bins)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With