I feel like I'm missing something ridiculously basic here.
If I'm trying to create a bar chart with values from a dataframe, what's the difference between calling .plot on the dataframe object and just entering the data within plt.plot's parentheses?
e.g.
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
VERSUS
df.groupby('category').count().plot(kind='bar')?
Can someone please walk me through what the difference is and when I should use either? I get that with plt.plot I'm calling the plot method of the plt (Matplotlib) library, whereas when I do df.plot I'm calling plot on the dataframe? What does that mean exactly -- that the dataframe has a plot object?
Those are different plotting methods. Fundamentally, they both produce a matplotlib object, which can be shown via one of the matplotlib backends.
There is however an important difference. Pandas bar plots are categorical in nature. This means, bars are positionned at subsequent integer numbers, and each bar gets a tick with a label according to the index of the dataframe. For example:
import matplotlib.pyplot as plt
import pandas as pd
s = pd.Series([30,20,10,40], index=[1,4,5,9])
s.plot.bar()
plt.show()

Here, there are four bars, the first is at positon 0, with the first label of the series' index, 1. The second is at positon 1, with the label 4 etc.
In contrast, a matplotlib bar plot is numeric in nature. Compare this to
import matplotlib.pyplot as plt
import pandas as pd
s = pd.Series([30,20,10,40], index=[1,4,5,9])
plt.bar(s.index, s.values)
plt.show()

Here the bars are at the numerical position of the index; the first bar at 1, the second at 4 etc. and the axis labelling is independent of where the bars are.
Note that you can achieve a categorical bar plot with matplotlib by casting your values to strings.
plt.bar(s.index.astype(str), s.values)
The result looks similar to the pandas plot, except for some minor tweaks like rotated labels and bar widths. In case you are interested in tweaking some sophisticated properties, it will be easier to do with a matplotlib bar plot, because that directly returns the bar container with all the bars.
bc = plt.bar()
for bar in bc:
bar.set_some_property(...)
Pandas plot function is using Matplotlib's pyplot to do the plotting, but it's like a shortcut.
I was similarly confused when I started trying to visualise my data, but I decided in the end to learn matplotlib because in the end you get more control of the visualisation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With