Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Differences between bar plots in Matplotlib and pandas

I feel like I'm missing something ridiculously basic here.

If I'm trying to create a bar chart with values from a dataframe, what's the difference between calling .plot on the dataframe object and just entering the data within plt.plot's parentheses?

e.g.

plt.plot([1, 2, 3, 4], [1, 4, 9, 16])

VERSUS

df.groupby('category').count().plot(kind='bar')?

Can someone please walk me through what the difference is and when I should use either? I get that with plt.plot I'm calling the plot method of the plt (Matplotlib) library, whereas when I do df.plot I'm calling plot on the dataframe? What does that mean exactly -- that the dataframe has a plot object?

like image 379
dbs5 Avatar asked Jun 30 '26 16:06

dbs5


2 Answers

Those are different plotting methods. Fundamentally, they both produce a matplotlib object, which can be shown via one of the matplotlib backends.

There is however an important difference. Pandas bar plots are categorical in nature. This means, bars are positionned at subsequent integer numbers, and each bar gets a tick with a label according to the index of the dataframe. For example:

import matplotlib.pyplot as plt
import pandas as pd

s = pd.Series([30,20,10,40], index=[1,4,5,9])
s.plot.bar()

plt.show()

enter image description here

Here, there are four bars, the first is at positon 0, with the first label of the series' index, 1. The second is at positon 1, with the label 4 etc.

In contrast, a matplotlib bar plot is numeric in nature. Compare this to

import matplotlib.pyplot as plt
import pandas as pd

s = pd.Series([30,20,10,40], index=[1,4,5,9])
plt.bar(s.index, s.values)

plt.show()

enter image description here

Here the bars are at the numerical position of the index; the first bar at 1, the second at 4 etc. and the axis labelling is independent of where the bars are.

Note that you can achieve a categorical bar plot with matplotlib by casting your values to strings.

plt.bar(s.index.astype(str), s.values)

enter image description here

The result looks similar to the pandas plot, except for some minor tweaks like rotated labels and bar widths. In case you are interested in tweaking some sophisticated properties, it will be easier to do with a matplotlib bar plot, because that directly returns the bar container with all the bars.

bc = plt.bar()
for bar in bc:
    bar.set_some_property(...)
like image 198
ImportanceOfBeingErnest Avatar answered Jul 03 '26 05:07

ImportanceOfBeingErnest


Pandas plot function is using Matplotlib's pyplot to do the plotting, but it's like a shortcut.

I was similarly confused when I started trying to visualise my data, but I decided in the end to learn matplotlib because in the end you get more control of the visualisation.

like image 30
Hyder Al Hassani Avatar answered Jul 03 '26 04:07

Hyder Al Hassani



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!