I have the following data frame (df) in pandas:
NetPrice Units Royalty
Price
3.65 9.13 171 57.60
3.69 9.23 13 4.54
3.70 9.25 129 43.95
3.80 9.49 122 42.76
3.90 9.74 105 38.30
3.94 9.86 158 57.35
3.98 9.95 37 13.45
4.17 10.42 69 27.32
4.82 12.04 176 77.93
4.84 24.22 132 59.02
5.16 12.91 128 60.81
5.22 13.05 129 62.00
I am trying to create a histogram on the index ("Price) with an y-axis of "Units" . I started with the following:
plt.hist(df.index)
This gives me a histogram plotting the price. How can I add the Units to the y-axis? Right now it is just a "scale".
Thank you!
In order to plot a histogram using pandas, chain the . hist() function to the dataframe. This will return the histogram for each numeric column in the pandas dataframe.
Indexing is used to access values present in the Dataframe using “loc” and “iloc” functions. In Numpy arrays, we are familiar with the concepts of indexing, slicing, and masking, etc. Similarly, Pandas to supports indexing in their Dataframe.
Use DataFrame. loc[] and DataFrame. iloc[] to select a single column or multiple columns from pandas DataFrame by column names/label or index position respectively. where loc[] is used with column labels/names and iloc[] is used with column index/position.
In order to set index to column in pandas DataFrame use reset_index() method. By using this you can also set single, multiple indexes to a column. If you are not aware by default, pandas adds an index to each row of the pandas DataFrame.
Because your data is already partially aggregated, you can't use the hist()
methods directly. Like @snorthway said in the comments, you can do this with a bar chart. Only you need to put your data in buckets first. My favorite way to put data in buckets is with the pandas cut()
method.
Let's set up some example data since you didn't provide some that's easy to use:
np.random.seed(1)
n = 1000
df = pd.DataFrame({'Price' : np.random.normal(5,2,size=n),
'Units' : np.random.randint(100, size=n)})
Let's put the prices into 10 evenly spaced buckets:
df['bucket'] = pd.cut(df.Price, 10)
print df.head()
Price Units bucket
0 8.248691 98 (7.307, 8.71]
1 3.776487 8 (3.0999, 4.502]
2 3.943656 89 (3.0999, 4.502]
3 2.854063 27 (1.697, 3.0999]
4 6.730815 29 (5.905, 7.307]
So now we have a field that contains the bucket range. If you want to give those buckets other names, you can read about that in the excellent Pandas documentation. Now we can use the Pandas groupby()
method and sum()
to add up the units:
newdf = df[['bucket','Units']].groupby('bucket').sum()
print newdf
Units
bucket
(-1.122, 0.295] 492
(0.295, 1.697] 1663
(1.697, 3.0999] 5003
(3.0999, 4.502] 11084
(4.502, 5.905] 15144
(5.905, 7.307] 11053
(7.307, 8.71] 4424
(8.71, 10.112] 1008
(10.112, 11.515] 77
(11.515, 12.917] 122
That looks like a winner... now let's plot it:
newdf.plot(kind='bar')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With