Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Histogram of Filtered Dataframe

This has been driving me mad for the one last hour. I can draw a histogram when I use:

hist(df.GVW, bins=50, range=(0,200))

I use the following when I need to filter the dataframe for a given condition in one of the columns, for example:

df[df.TYPE=='SU4']

So far, everything works. When I try to get a histogram of this filtered data I get a key error: KeyError: 0L. I use the following for the histogram of filtered data:

hist(df[df.TYPE=='SU4'].GVW, bins=50, range=(0,200))

Is there a syntax error somewhere? Thanks for the help!

like image 692
marillion Avatar asked Mar 18 '14 22:03

marillion


People also ask

How do you plot a histogram in Pandas Dataframe?

In order to plot a histogram using pandas, chain the . hist() function to the dataframe. This will return the histogram for each numeric column in the pandas dataframe.

How do you plot a histogram with different variables in Python?

plt. hist() method is used multiple times to create a figure of three overlapping histograms. we adjust opacity, color, and number of bins as needed. Three different columns from the data frame are taken as data for the histograms.

What is bins in histogram Python?

It is a type of bar graph. To construct a histogram, the first step is to “bin” the range of values — that is, divide the entire range of values into a series of intervals — and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable.


2 Answers

Maybe try to use the .values attribute (this returns the data as a numpy array), so:

hist(df[df.TYPE=='SU4'].GVW.values, bins=50, range=(0,200))

I assume the reason this does not work is because the matplotlib hist method tries to access the first 0-index element of the input. But because the Series uses its integer index as label and not location, this gives a key error for a sliced Series (as the first element will not have index 0 anymore)


And indeed, as @AndyHayden says, you can also use the pandas hist method:

df[df.TYPE=='SU4'].GVW.hist(bins=50)
like image 112
joris Avatar answered Sep 21 '22 19:09

joris


I had a similar issue plotting a dataframe I derived using a query. I found that if after deriving the frame I used the reset_index() function on the derived frame it resolved the issue.

like image 30
user3685329 Avatar answered Sep 18 '22 19:09

user3685329