Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas plot histogram data frame index

I have the following data frame (df) in pandas:

       NetPrice  Units  Royalty
Price                       
3.65    9.13    171    57.60
3.69    9.23     13     4.54
3.70    9.25    129    43.95
3.80    9.49    122    42.76
3.90    9.74    105    38.30
3.94    9.86    158    57.35
3.98    9.95     37    13.45
4.17   10.42     69    27.32
4.82   12.04    176    77.93
4.84   24.22    132    59.02
5.16   12.91    128    60.81
5.22   13.05    129    62.00

I am trying to create a histogram on the index ("Price) with an y-axis of "Units" . I started with the following:

plt.hist(df.index)

This gives me a histogram plotting the price. How can I add the Units to the y-axis? Right now it is just a "scale".

Thank you!

like image 389
DigitalMusicology Avatar asked Nov 26 '14 19:11

DigitalMusicology


People also ask

How do you plot a histogram in pandas DataFrame?

In order to plot a histogram using pandas, chain the . hist() function to the dataframe. This will return the histogram for each numeric column in the pandas dataframe.

Can you index a pandas DataFrame?

Indexing is used to access values present in the Dataframe using “loc” and “iloc” functions. In Numpy arrays, we are familiar with the concepts of indexing, slicing, and masking, etc. Similarly, Pandas to supports indexing in their Dataframe.

How do you access a DataFrame column by index?

Use DataFrame. loc[] and DataFrame. iloc[] to select a single column or multiple columns from pandas DataFrame by column names/label or index position respectively. where loc[] is used with column labels/names and iloc[] is used with column index/position.

How do you set the index of a DataFrame to a column in pandas?

In order to set index to column in pandas DataFrame use reset_index() method. By using this you can also set single, multiple indexes to a column. If you are not aware by default, pandas adds an index to each row of the pandas DataFrame.


1 Answers

Because your data is already partially aggregated, you can't use the hist() methods directly. Like @snorthway said in the comments, you can do this with a bar chart. Only you need to put your data in buckets first. My favorite way to put data in buckets is with the pandas cut() method.

Let's set up some example data since you didn't provide some that's easy to use:

np.random.seed(1)
n = 1000
df = pd.DataFrame({'Price' : np.random.normal(5,2,size=n),
                   'Units' : np.random.randint(100, size=n)})

Let's put the prices into 10 evenly spaced buckets:

df['bucket'] = pd.cut(df.Price, 10)
print df.head()

      Price  Units           bucket
0  8.248691     98    (7.307, 8.71]
1  3.776487      8  (3.0999, 4.502]
2  3.943656     89  (3.0999, 4.502]
3  2.854063     27  (1.697, 3.0999]
4  6.730815     29   (5.905, 7.307]

So now we have a field that contains the bucket range. If you want to give those buckets other names, you can read about that in the excellent Pandas documentation. Now we can use the Pandas groupby() method and sum() to add up the units:

newdf = df[['bucket','Units']].groupby('bucket').sum()
print newdf
                  Units
bucket                 
(-1.122, 0.295]     492
(0.295, 1.697]     1663
(1.697, 3.0999]    5003
(3.0999, 4.502]   11084
(4.502, 5.905]    15144
(5.905, 7.307]    11053
(7.307, 8.71]      4424
(8.71, 10.112]     1008
(10.112, 11.515]     77
(11.515, 12.917]    122

That looks like a winner... now let's plot it:

 newdf.plot(kind='bar')

enter image description here

like image 181
JD Long Avatar answered Sep 28 '22 17:09

JD Long