Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting distributions with uneven lengths

Tags:

python

plotly

Following the plotly directions, I would like to plot something similar to the following code:

import plotly.plotly as py
import plotly.figure_factory as ff

import numpy as np

# Add histogram data
x1 = np.random.randn(200) - 2  
x2 = np.random.randn(200)  
x3 = np.random.randn(200) + 2  
x4 = np.random.randn(200) + 4  


# Group data together
hist_data = [x1, x2, x3, x4]

group_labels = ['Group 1', 'Group 2', 'Group 3', 'Group 4']

# Create distplot with custom bin_size
fig = ff.create_distplot(hist_data, group_labels, bin_size = [.1, .25, .5, 1])

# Plot!
py.iplot(fig, filename = 'Distplot with Multiple Bin Sizes')

However, I have a real world dataset that is uneven in sample size (i.e. count of group 1 is different than count in group 2, etc.). Furthermore, it's in name-value pair format.

Here is some dummy data to illustrate:

# Add histogram data
x1 = pd.DataFrame(np.random.randn(100))
x1['name'] = 'x1'

x2 = pd.DataFrame(np.random.randn(200) + 1)
x2['name'] = 'x2'

x3 = pd.DataFrame(np.random.randn(300) - 1)
x3['name'] = 'x3'

df = pd.concat([x1, x2, x3])
df = df.reset_index(drop = True)
df.columns = ['value', 'names'] 

df

As you can see, each name (x1, x2, x3) has a different count, and also the "names" column is what I would like to use as the color.

Does anyone know how I can plot this in plotly?

FYI in R, it's very simple, I would simply call ggplot, and in aes(fill = names).

Any help would be appreciated, thank you!

like image 737
Trexion Kameha Avatar asked Dec 03 '25 07:12

Trexion Kameha


2 Answers

You could try slicing your dataframe and then putting it into in Ploty.

fig = ff.create_distplot([df[df.names == a].value for a in df.names.unique()], df.names.unique(), bin_size=[.1, .25, .5, 1])

enter image description here

import plotly
import pandas as pd
plotly.offline.init_notebook_mode()
x1 = pd.DataFrame(np.random.randn(100))
x1['name']='x1'

x2 = pd.DataFrame(np.random.randn(200)+1)
x2['name']='x2'

x3 = pd.DataFrame(np.random.randn(300)-1)
x3['name']='x3'

df=pd.concat([x1,x2,x3])
df=df.reset_index(drop=True)
df.columns = ['value','names'] 
fig = ff.create_distplot([df[df.names == a].value for a in df.names.unique()], df.names.unique(), bin_size=[.1, .25, .5, 1])
plotly.offline.iplot(fig, filename='Distplot with Multiple Bin Sizes')
like image 69
Maximilian Peters Avatar answered Dec 05 '25 22:12

Maximilian Peters


The example in plotly's documentation works out of the box for uneven sample sizes too:

#!/usr/bin/env python 

import plotly
import plotly.figure_factory as ff
plotly.offline.init_notebook_mode()
import numpy as np

# data with different sizes
x1 = np.random.randn(300)-2  
x2 = np.random.randn(200)  
x3 = np.random.randn(4000)+2  
x4 = np.random.randn(50)+4  

# Group data together
hist_data = [x1, x2, x3, x4]

# use custom names
group_labels = ['x1', 'x2', 'x3', 'x4']

# Create distplot with custom bin_size
fig = ff.create_distplot(hist_data, group_labels, bin_size=.2)

# change that if you don't want to plot offline
plotly.offline.plot(fig, filename='Distplot with Multiple Datasets')

The above script will produce the following result:


enter image description here

like image 32
coder Avatar answered Dec 05 '25 21:12

coder