I think they all look the same but there must be some difference. They all take a single column as input, and the y-axis has the count for all plots.

Those plotting functions <code>pyplot.hist</code>, <code>seaborn.countplot</code> and <code>seaborn.displot</code> are all helper tools to plot the frequency of a single variable. Depending on the nature of this variable they might be more or less suitable for visualization. <h3>Continuous variable</h3> A continuous variable <code>x</code> may be histrogrammed to show the frequency distribution. <pre class="prettyprint"><code>import matplotlib.pyplot as plt import numpy as np x = np.random.rand(100)*100 hist, edges = np.histogram(x, bins=np.arange(0,101,10)) plt.bar(edges[:-1], hist, align="edge", ec="k", width=np.diff(edges)) plt.show() </code></pre> <img src="https://i.stack.imgur.com/8TbRh.png" alt="enter image description here"> The same can be achieved using <code>pyplot.hist</code> or <code>seaborn.distplot</code>, <pre class="prettyprint"><code>plt.hist(x, bins=np.arange(0,101,10), ec="k") </code></pre> or <pre class="prettyprint"><code>sns.distplot(x, bins=np.arange(0,101,10), kde=False, hist_kws=dict(ec="k")) </code></pre> <code>distplot</code> wraps <code>pyplot.hist</code>, but has some other features in addition that allow to e.g. show a kernel density estimate. <h3>Discrete variable</h3> For a discrete variable, a histogram may or may not be suitable. If you use a <code>numpy.histogram</code>, the bins would need to be exactly inbetween the expected discrete observations. <pre class="prettyprint"><code>x1 = np.random.randint(1,11,100) hist, edges = np.histogram(x1, bins=np.arange(1,12)-0.5) plt.bar(edges[:-1], hist, align="edge", ec="k", width=np.diff(edges)) plt.xticks(np.arange(1,11)) </code></pre> <img src="https://i.stack.imgur.com/A5h6k.png" alt="enter image description here"> One could instead also count the unique elements in <code>x</code>, <pre class="prettyprint"><code>u, counts = np.unique(x1, return_counts=True) plt.bar(u, counts, align="center", ec="k", width=1) plt.xticks(u) </code></pre> resulting in the same plot as above. The main difference is for the case where not every possible observation is occupied. Say <code>5</code> is not even part of your data. A histogram approach would still show it, while it's not part of the unique elements. <pre class="prettyprint"><code>x2 = np.random.choice([1,2,3,4,6,7,8,9,10], size=100) plt.subplot(1,2,1) plt.title("histogram") hist, edges = np.histogram(x2, bins=np.arange(1,12)-0.5) plt.bar(edges[:-1], hist, align="edge", ec="k", width=np.diff(edges)) plt.xticks(np.arange(1,11)) plt.subplot(1,2,2) plt.title("counts") u, counts = np.unique(x2, return_counts=True) plt.bar(u.astype(str), counts, align="center", ec="k", width=1) </code></pre> <img src="https://i.stack.imgur.com/vQZ2n.png" alt="enter image description here"> The latter is what <code>seaborn.countplot</code> does. <pre class="prettyprint"><code>sns.countplot(x2, color="C0") </code></pre> <img src="https://i.stack.imgur.com/zCUDL.png" alt="enter image description here"> It is hence suitable for discrete or categorical variables. <h3>Summary</h3> All functions <code>pyplot.hist</code>, <code>seaborn.countplot</code> and <code>seaborn.displot</code> act as wrappers for a matplotlib bar plot and may be used if manually plotting such bar plot is considered too cumbersome. For continuous variables, a <code>pyplot.hist</code> or <code>seaborn.distplot</code> may be used. For discrete variables, a <code>seaborn.countplot</code> is more convenient.

what is major difference between histogram,countplot and distplot in Seaborn library?

1 Answers

Those plotting functions pyplot.hist, seaborn.countplot and seaborn.displot are all helper tools to plot the frequency of a single variable. Depending on the nature of this variable they might be more or less suitable for visualization.

Continuous variable

A continuous variable x may be histrogrammed to show the frequency distribution.

import matplotlib.pyplot as plt
import numpy as np

x = np.random.rand(100)*100
hist, edges = np.histogram(x, bins=np.arange(0,101,10))
plt.bar(edges[:-1], hist, align="edge", ec="k", width=np.diff(edges))

plt.show()

enter image description here

The same can be achieved using pyplot.hist or seaborn.distplot,

plt.hist(x, bins=np.arange(0,101,10), ec="k")

sns.distplot(x, bins=np.arange(0,101,10), kde=False, hist_kws=dict(ec="k"))

distplot wraps pyplot.hist, but has some other features in addition that allow to e.g. show a kernel density estimate.

Discrete variable

For a discrete variable, a histogram may or may not be suitable. If you use a numpy.histogram, the bins would need to be exactly inbetween the expected discrete observations.

x1 = np.random.randint(1,11,100)

hist, edges = np.histogram(x1, bins=np.arange(1,12)-0.5)
plt.bar(edges[:-1], hist, align="edge", ec="k", width=np.diff(edges))
plt.xticks(np.arange(1,11))

enter image description here

One could instead also count the unique elements in x,

u, counts = np.unique(x1, return_counts=True)
plt.bar(u, counts, align="center", ec="k", width=1)
plt.xticks(u)

resulting in the same plot as above. The main difference is for the case where not every possible observation is occupied. Say 5 is not even part of your data. A histogram approach would still show it, while it's not part of the unique elements.

x2 = np.random.choice([1,2,3,4,6,7,8,9,10], size=100)

plt.subplot(1,2,1)
plt.title("histogram")
hist, edges = np.histogram(x2, bins=np.arange(1,12)-0.5)
plt.bar(edges[:-1], hist, align="edge", ec="k", width=np.diff(edges))
plt.xticks(np.arange(1,11))

plt.subplot(1,2,2)
plt.title("counts")
u, counts = np.unique(x2, return_counts=True)
plt.bar(u.astype(str), counts, align="center", ec="k", width=1)

enter image description here

The latter is what seaborn.countplot does.

sns.countplot(x2, color="C0")

enter image description here

It is hence suitable for discrete or categorical variables.

Summary

All functions pyplot.hist, seaborn.countplot and seaborn.displot act as wrappers for a matplotlib bar plot and may be used if manually plotting such bar plot is considered too cumbersome.
For continuous variables, a pyplot.hist or seaborn.distplot may be used. For discrete variables, a seaborn.countplot is more convenient.

152

answered Sep 17 '22 23:09

ImportanceOfBeingErnest

Related questions
                            
                                How to calculate percentage of sparsity for a numpy array/matrix?
                            
                                Missing menuBar in PyQt5
                            
                                tox tests, use setup.py extra_require as tox deps source
                            
                                How to solve nan loss?
                            
                                Flask-wtf dynamic select field with an empty option
                            
                                Error when trying to import sklearn modules : ImportError: DLL load failed: The specified module could not be found
                            
                                Matplotlib: xticks every 15 minutes, starting on the hour
                            
                                Django DetailView - How to change the get_object to check a field
                            
                                Explicitly Define Datatype in Python Function
                            
                                How to use hyperopt for hyperparameter optimization of Keras deep learning network?
                            
                                Failed to load the native TensorFlow runtime. Python 3.5.2
                            
                                How to do text pre-processing using spaCy?
                            
                                ImportError: No module named scapy.all
                            
                                Combining Django F, Value and a dict to annotate a queryset
                            
                                How to pass a list by reference?
                            
                                Convert python byte "array" to int "array
                            
                                How to set the default option as -h for Python click?
                            
                                Running a Python script in Jupyter Notebook, with arguments passing
                            
                                Optional job parameter in AWS Glue?
                            
                                Why does a large for loop with 10 billion iterations take a much longer time to run in Python than in C?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

what is major difference between histogram,countplot and distplot in Seaborn library?

Tags:

python

matplotlib

visualization

seaborn

data-analysis

Cooldeep

People also ask

1 Answers

Continuous variable

Discrete variable

Summary

ImportanceOfBeingErnest

Recent Activity

Donate For Us