I have a pandas df and after pivoting, it prints as following,
country CHINA USA
0 119.02 0.0
1 121.20 0.0
3 112.49 0.0
4 113.94 0.0
5 114.67 0.0
6 111.77 0.0
7 117.57 0.0
......................
......................
6648 0.00 420.0
6649 0.00 420.0
6650 0.00 420.0
6651 0.00 420.0
6652 0.00 420.0
6653 0.00 420.0
6654 0.00 500.0
6655 0.00 500.0
6656 0.00 390.0
6657 0.00 450.0
6658 0.00 420.0
6659 0.00 420.0
6660 0.00 450.0
The method is here,
def visualize_box_plot(df):
df = df[df.outlier != 1]
df = pd.pivot_table(df,
index=df.index,
columns = df['country'],
values='value',
fill_value = 0)
df.CHINA = df.CHINA.round(2)
df.USA = df.USA.round(2)
# this is the prints
# provided earlier
print df
df_usa = df[(df['USA'] != 0)]
df_china = df[(df['CHINA'] != 0)]
usa = df_usa.as_matrix()[:, -1]
china = df_china.as_matrix()[:,0]
print "USA:", len(usa), " ", "CHINA: ", len(china)
# unequal length
# USA: 1673 CHINA: 4384
x = [china, usa]
plt.boxplot(x)
plt.show()
Zero values come from the NaN during the time of pivoting and I would like omit them while making the box plot. So, I use the code,
df_usa = df[(df['USA'] != 0)]
df_china = df[(df['CHINA'] != 0)]
Those code actually creates seperate df and converted to the NUmpy matrix and lastly, I visualize them all together with matplotlib. Point to be considered, the length of the Numpy matrix is not the same and hence, I can't just call the boxplot function directly with df.
Here is my visualization where 1 and 2 needs to be replaced with CHINA and USA respectively,

The visualization is not good and I get the feelings there might be better way to
get the job done. Any suggestion ? Some sample code will help a lot. You may use the df rounding to 2 digits after the decimal. The main issue is to make the code elegant and improve the visualization better.
I think code can be more simplier - simply replace 0 to NaN and then call DataFrame.boxplot:
print (df.mask(df == 0))
#alternative solution
#print (df.replace(0,np.nan))
CHINA USA
country
0 119.02 NaN
1 121.20 NaN
3 112.49 NaN
4 113.94 NaN
5 114.67 NaN
6 111.77 NaN
7 117.57 NaN
6648 NaN 420.0
6649 NaN 420.0
6650 NaN 420.0
6651 NaN 420.0
6652 NaN 420.0
6653 NaN 420.0
6654 NaN 500.0
6655 NaN 500.0
6656 NaN 390.0
6657 NaN 450.0
6658 NaN 420.0
6659 NaN 420.0
6660 NaN 450.0
df.mask(df == 0).boxplot()

Another possible solution is use DataFrame.plot.box:
df.mask(df == 0).plot.box()

Box Plots in docs
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With