Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Controlling Bin Widths in Altair

I have a set of numbers that I'd like to plot on a histogram.

Say:

import numpy as np
import matplotlib.pyplot as plt

my_numbers = np.random.normal(size = 1000)
plt.hist(my_numbers)

If I want to control the size and range of the bins I could do this:

plt.hist(my_numbers, bins=np.arange(-4,4.5,0.5))

Now, if I want to plot a histogram in Altair the code below will do, but how do I control the size and range of the bins in Altair?

import pandas as pd
import altair as alt

my_numbers_df = pd.DataFrame.from_dict({'Integers': my_numbers})

alt.Chart(my_numbers_df).mark_bar().encode(
    alt.X("Integers", bin = True),
    y = 'count()',
)

I have searched Altair's docs but all their explanations and sample charts (that I could find) just said bin = True with no further modification.

Appreciate any pointers :)

like image 716
stephan Avatar asked Feb 28 '19 04:02

stephan


People also ask

How do I choose bin width?

Calculate the number of bins by taking the square root of the number of data points and round up. Calculate the bin width by dividing the specification tolerance or range (USL-LSL or Max-Min value) by the # of bins.

Why is it important to change the bin width?

The wider the range (bin width) you use, the fewer columns (bins) you will have. Bins that are too wide can hide important details about distribution while bins that are too narrow can cause a lot of noise and hide important information about the distribution as well.

What is bin width?

The towers or bars of a histogram are called bins. The height of each bin shows how many values from that data fall into that range. Width of each bin is = (max value of data – min value of data) / total number of bins.

Can histograms have different bin widths?

Most histograms use bin widths that are as equal as possible, but it is also possible to use unequal bin widths (see the 'Variable bin widths' section of Histogram). A recommended strategy is to size bins so the number of values they contain is approximately equal.


1 Answers

As demonstrated briefly in the Bin transforms section of the documentation, you can pass an alt.Bin() instance to fine-tune the binning parameters.

The equivalent of your matplotlib histogram would be something like this:

alt.Chart(my_numbers_df).mark_bar().encode(
    alt.X("Integers", bin=alt.Bin(extent=[-4, 4], step=0.5)),
    y='count()',
)

enter image description here

like image 92
jakevdp Avatar answered Sep 25 '22 13:09

jakevdp