Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overlaying two histograms with plotly express

I'd like to overlay two histograms which I currently display only one next to the other using the following simplistic code. The two dataframes are not the same length, but it still makes sense to overlay their histogram values.

import plotly.express as px

fig1 = px.histogram(test_lengths, x='len', histnorm='probability', nbins=10)
fig2 = px.histogram(train_lengths, x='len', histnorm='probability', nbins=10)
fig1.show()
fig2.show()

with pure plotly, this is the way, copied from the documentation:

import plotly.graph_objects as go

import numpy as np

x0 = np.random.randn(500)
# Add 1 to shift the mean of the Gaussian distribution
x1 = np.random.randn(500) + 1

fig = go.Figure()
fig.add_trace(go.Histogram(x=x0))
fig.add_trace(go.Histogram(x=x1))

# Overlay both histograms
fig.update_layout(barmode='overlay')
# Reduce opacity to see both histograms
fig.update_traces(opacity=0.75)
fig.show()

I just wonder if there's any particularly idiomatic way with plotly express. Hopefully this also works to exeplify the completeness and different levels of abstraction between plotly and plotly express.

like image 699
matanster Avatar asked Sep 18 '19 08:09

matanster


People also ask

How do I use Plotly histograms with categorical data?

Plotly histograms will automatically bin numerical or date data but can also be used on raw categorical data, as in the following example, where the X-axis value is the categorical "day" variable: import plotly.express as px df = px.data.tips() fig = px.histogram(df, x="day", category_orders=dict(day=["Thur", "Fri", "Sat", "Sun"])) fig.show()

What are overlapping histograms?

Overlapping histograms are used to compare the frequency distribution of a continuous variable in two or more categories. Even though the PE website indicates that you can draw several histograms for the different values of one column using the argument color, I did not get a proper chart.

What is the difference between histogram and violin plot?

More generally, in Plotly a histogram is an aggregated bar chart, with several possible aggregation functions (e.g. sum, average, count...) which can be used to visualize data on categorical and date axes as well as linear axes. Alternatives to violin plots for visualizing distributions include violin plots, box plots, ECDF plots and strip charts.

How do you compare two histograms in Excel?

To compare two scale variables, one option is to overlay two histograms on each other. The example will use as a binary field the ‘Gender’ and ‘Height’ as a scale field. We need to separate the scores for each category.


2 Answers

The trick is to make a single Plotly Express figure by combining the data into a tidy dataframe, rather than to make two figures and try to combine them (which is currently impossible):

import numpy as np
import pandas as pd
import plotly.express as px

x0 = np.random.randn(250)
# Add 1 to shift the mean of the Gaussian distribution
x1 = np.random.randn(500) + 1

df =pd.DataFrame(dict(
    series=np.concatenate((["a"]*len(x0), ["b"]*len(x1))), 
    data  =np.concatenate((x0,x1))
))

px.histogram(df, x="data", color="series", barmode="overlay")

Yields:

enter image description here

like image 130
nicolaskruchten Avatar answered Nov 03 '22 20:11

nicolaskruchten


You can get at the px structure and use it to create a figure. I had a desire to show a stacked histogram using the 'color' option that's in express but hard to re-create in pure plotly.

Given a dataframe (df) with utctimestamp as a time index, severity and category as things to count in the histogram I used this to get a stacked histogram:

figure_data=[]
figure_data.extend([i for i in px.histogram(df, x="utctimestamp", color="severity", histfunc="count").to_dict()['data']])
figure_data.extend([i for i in px.histogram(df, x="utctimestamp", color="category", histfunc="count").to_dict()['data']])
fig=go.Figure(figure_data)
fig.update_layout(barmode='stack')
fig.update_traces(overwrite=True, marker={"opacity": 0.7}) 
fig.show()

tl;dr px.histogram creates a list of histogram objects that you can grab as a list and render via go.Figure.

I can't post inline, but here's stacked histograms from px https://imgur.com/a/l7BblZo

like image 22
Jeff Bryner Avatar answered Nov 03 '22 19:11

Jeff Bryner