Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Seaborn: Setting a distplot bin range?

So I have this data set showing the GDP of countries in billions (so 1 trillion gdp = 1000).

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.read_csv('2014_World_GDP')
df.sort('GDP (BILLIONS)',ascending=False, inplace=True)
sorted = df['GDP (BILLIONS)']

fig, ax = plt.subplots(figsize=(12, 8))
sns.distplot(sorted,bins=8,kde=False,ax=ax)

The above code give me the following figure: image

What I want to do whoever is set the bins range so they look more like [250,500,750,1000,2000,5000,10000,20000].

Is there a way to do that in seaborn?

like image 756
F.Ketbi Avatar asked May 13 '17 07:05

F.Ketbi


People also ask

How do you set bins in Seaborn?

bins. The bins parameter enables you to control the bins of the histogram (i.e., the number of bars). The most common way to do this is to set the number of bins by providing an integer as the argument to the parameter. For example, if you set bins = 30 , the function will create a histogram with 30 bars (i.e., bins).

Is Distplot deprecated?

Staring with version 0.11. 0 of the Python Seaborn data visualization library, distplot is a deprecated function, and will be removed in a future version.


2 Answers

You could use logarithmic bins, which would work well with data that is distributed as yours is. Here is an example:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.DataFrame()
df['GDP (BILLIONS)'] = 2000*1./(np.random.random(250))
df.sort_values(by='GDP (BILLIONS)',ascending=False, inplace=True)

fig, ax = plt.subplots(1,2,figsize=(8, 3))

sns.distplot(df['GDP (BILLIONS)'].values,bins=8,kde=False,ax=ax[0])
ax[0].set_title('Linear Bins')

LogMin, LogMax = np.log10(df['GDP (BILLIONS)'].min()),np.log10(df['GDP (BILLIONS)'].max())
newBins = np.logspace(LogMin, LogMax,8)
sns.distplot(df['GDP (BILLIONS)'].values,bins=newBins,kde=False,ax=ax[1])
ax[1].set_xscale('log')
ax[1].set_title('Log Bins')

fig.show()

enter image description here

like image 136
Robbie Avatar answered Oct 01 '22 23:10

Robbie


You could just put your bin range as a sequence, in your case that would be:

sns.distplot(df['GDP (BILLIONS)'].values,
             bins=[250,500,750,1000,2000,5000,10000,20000],
             kde=False,ax=ax[0])

However, doing this alone won't change the x-axis scale, you would need the set scale lines in Robbie's answer to do that.

like image 35
Kc3 Avatar answered Oct 01 '22 22:10

Kc3