Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make matplotlib/pandas bar chart look like hist chart?

Plotting Differences between bar and hist

Given some data in a pandas.Series , rv, there is a difference between

  1. Calling hist directly on the data to plot

  2. Calculating the histogram results (with numpy.histogram) then plotting with bar

Example Data Generation

%matplotlib inline

import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib
matplotlib.rcParams['figure.figsize'] = (12.0, 8.0)
matplotlib.style.use('ggplot')

# Setup size and distribution
size = 50000
distribution = stats.norm()

# Create random data
rv = pd.Series(distribution.rvs(size=size))
# Get sane start and end points of distribution
start = distribution.ppf(0.01)
end = distribution.ppf(0.99)

# Build PDF and turn into pandas Series
x = np.linspace(start, end, size)
y = distribution.pdf(x)
pdf = pd.Series(y, x)

# Get histogram of random data
y, x = np.histogram(rv, bins=50, normed=True)
# Correct bin edge placement
x = [(a+x[i+1])/2.0 for i,a in enumerate(x[0:-1])]
hist = pd.Series(y, x)

hist() Plotting

ax = pdf.plot(lw=2, label='PDF', legend=True)
rv.plot(kind='hist', bins=50, normed=True, alpha=0.5, label='Random Samples', legend=True, ax=ax)

hist plotting

bar() Plotting

ax = pdf.plot(lw=2, label='PDF', legend=True)
hist.plot(kind='bar', alpha=0.5, label='Random Samples', legend=True, ax=ax)

bar plotting

How can the bar plot be made to look like the hist plot?

The use case for this is needing to save only the histogrammed data to use and plot later (it is typically smaller in size than the original data).

like image 943
tmthydvnprt Avatar asked May 31 '16 14:05

tmthydvnprt


People also ask

How do I make my Matplotlib histogram look better?

Tweaking Matplotlib Preferably, one that has tick mark and other features closer to the aesthetic you want to achieve. Turn the frame and grid lines off. Tweak the x-axis so that there is a gap with the y-axis, which seems more appropriate for histograms. Have color options allowing for separation between bins.

What is Rwidth Matplotlib?

rwidth : This parameter is an optional parameter and it is a relative width of the bars as a fraction of the bin width. log : This parameter is an optional parameter and it is used to set histogram axis to a log scale.


2 Answers

Bar plotting differences

Obtaining a bar plot that looks like the hist plot requires some manipulating of default behavior for bar.

  1. Force bar to use actual x data for plotting range by passing both x (hist.index) and y (hist.values). The default bar behavior is to plot the y data against an arbitrary range and put the x data as the label.
  2. Set the width parameter to be related to actual step size of x data (The default is 0.8)
  3. Set the align parameter to 'center'.
  4. Manually set the axis legend.

These changes need to be made via matplotlib's bar() called on the axis (ax) instead of pandas's bar() called on the data (hist).

Example Plotting

%matplotlib inline

import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib
matplotlib.rcParams['figure.figsize'] = (12.0, 8.0)
matplotlib.style.use('ggplot')

# Setup size and distribution
size = 50000
distribution = stats.norm()

# Create random data
rv = pd.Series(distribution.rvs(size=size))
# Get sane start and end points of distribution
start = distribution.ppf(0.01)
end = distribution.ppf(0.99)

# Build PDF and turn into pandas Series
x = np.linspace(start, end, size)
y = distribution.pdf(x)
pdf = pd.Series(y, x)

# Get histogram of random data
y, x = np.histogram(rv, bins=50, normed=True)
# Correct bin edge placement
x = [(a+x[i+1])/2.0 for i,a in enumerate(x[0:-1])]
hist = pd.Series(y, x)

# Plot previously histogrammed data
ax = pdf.plot(lw=2, label='PDF', legend=True)
w = abs(hist.index[1]) - abs(hist.index[0])
ax.bar(hist.index, hist.values, width=w, alpha=0.5, align='center')
ax.legend(['PDF', 'Random Samples'])

histogrammed plot

like image 101
tmthydvnprt Avatar answered Sep 28 '22 00:09

tmthydvnprt


Another, simpler solution is to create fake samples that reproduce the same histogram and then simply use hist().

I.e., after retrieving bins and counts from stored data, do

fake = np.array([])
for i in range(len(counts)):
    a, b = bins[i], bins[i+1]
    sample = a + (b-a)*np.random.rand(counts[i])
    fake = np.append(fake, sample)

plt.hist(fake, bins=bins)
like image 23
Gregor Mitscha-Baude Avatar answered Sep 28 '22 01:09

Gregor Mitscha-Baude