Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas - histogram from two columns?

I have this data:

data = pd.DataFrame().from_dict([r for r in response])
print data

     _id  total
0    213      1
1    194      3
2    205      156
...

Now, if I call:

data.hist()

I will get two separate histograms, one for each column. This is not what I want. What I want is a single histogram made using those two columns, where one column is interpreted as a value and another one as a number of occurrences of this value. What should I do to generate such a histogram?

I tried:

data.hist(column="_id", by="total")

But this generates even more (empty) histograms with error message.

like image 539
mnowotka Avatar asked Jul 22 '15 19:07

mnowotka


2 Answers

You can always drop to the lower-level matplotlib.hist:

from matplotlib.pyplot import hist
df = pd.DataFrame({
    '_id': np.random.randn(100),
    'total': 100 * np.random.rand()
})
hist(df._id, weights=df.total)

enter image description here

like image 58
Ami Tavory Avatar answered Nov 12 '22 22:11

Ami Tavory


Since you already have the bin frequencies computed (the total column), just use pandas.DataFrame.plot

data.plot( x='_id', y='total', kind='hist')
like image 27
dermen Avatar answered Nov 12 '22 23:11

dermen