Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Frequency plot in Python/Pandas DataFrame

I have a parsed very large dataframe with some values like this and several columns:

Name Age Points ...
XYZ  42  32pts  ...
ABC  41  32pts  ...
DEF  32  35pts
GHI  52  35pts
JHK  72  35pts
MNU  43  42pts
LKT  32  32pts
LKI  42  42pts
JHI  42  35pts
JHP  42  42pts
XXX  42  42pts
XYY  42  35pts

I have imported numpy and matplotlib.

I need to plot a graph of the number of times the value in the column 'Points' occurs. I dont need to have any bins for the plotting. So it is more of a plot to see how many times the same score of points occurs over a large dataset.

So essentially the bar plot (or histogram, if you can call it that) should show that 32pts occurs thrice, 35pts occurs 5 times and 42pts occurs 4 times. If I can plot the values in sorted order, all the more better. I have tried df.hist() but it is not working for me. Any clues? Thanks.

like image 301
SMU Avatar asked Oct 20 '14 23:10

SMU


People also ask

How do you plot a histogram in pandas DataFrame?

In order to plot a histogram using pandas, chain the . hist() function to the dataframe. This will return the histogram for each numeric column in the pandas dataframe.

What does Asfreq do in pandas?

asfreq() function is used to convert TimeSeries to specified frequency. This function Optionally provide filling method to pad/backfill missing values. It Returns the original data conformed to a new index with the specified frequency.


2 Answers

I would plot the results of the dataframe's value_count method directly:

import matplotlib.pyplot as plt
import pandas

data = load_my_data()
fig, ax = plt.subplots()
data['Points'].value_counts().plot(ax=ax, kind='bar')

If you want to remove the string 'pnts' from all of the elements in your column, you can do something like this:

df['points_int'] = df['Points'].str.replace('pnts', '').astype(int)

That assumes they all end with 'pnts'. If it varying from line to line, you need to look into regular expressions like this: Split columns using pandas

And the official docs: http://pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods

like image 104
Paul H Avatar answered Oct 26 '22 14:10

Paul H


Seaborn package has countplot function which can be made use of to make frequency plot:

import seaborn as sns

ax = sns.countplot(x="Points",data=df)
like image 27
Yogesh Kumar Avatar answered Oct 26 '22 14:10

Yogesh Kumar