I have a pandas df  and after pivoting, it prints as following, 
country   CHINA    USA
0        119.02    0.0
1        121.20    0.0
3        112.49    0.0
4        113.94    0.0
5        114.67    0.0
6        111.77    0.0
7        117.57    0.0
......................
......................
6648       0.00  420.0
6649       0.00  420.0
6650       0.00  420.0
6651       0.00  420.0
6652       0.00  420.0
6653       0.00  420.0
6654       0.00  500.0
6655       0.00  500.0
6656       0.00  390.0
6657       0.00  450.0
6658       0.00  420.0
6659       0.00  420.0
6660       0.00  450.0 
The method is here,
def visualize_box_plot(df):
    df = df[df.outlier != 1]
    df = pd.pivot_table(df, 
                     index=df.index, 
                     columns = df['country'],
                     values='value', 
                     fill_value = 0)
    df.CHINA = df.CHINA.round(2)
    df.USA = df.USA.round(2)
    # this is the prints 
    # provided earlier 
    print df 
    df_usa = df[(df['USA'] != 0)]
    df_china = df[(df['CHINA'] != 0)]
    usa = df_usa.as_matrix()[:, -1]
    china = df_china.as_matrix()[:,0]
    print "USA:", len(usa), " ", "CHINA: ", len(china)
    # unequal length 
    # USA: 1673   CHINA:  4384
    x =  [china, usa]
    plt.boxplot(x)
    plt.show()
Zero values come from the NaN during the time of pivoting and I would like omit them while making the box plot. So, I use the code, 
    df_usa = df[(df['USA'] != 0)]
    df_china = df[(df['CHINA'] != 0)]
Those code actually creates seperate df and converted to the NUmpy matrix and lastly, I visualize them all together with matplotlib. Point to be considered, the length of the Numpy matrix is not the same and hence, I can't just call the boxplot function directly with df. 
Here is my visualization where 1 and 2 needs to be replaced with CHINA and USA respectively,

The visualization is not good and I get the feelings there might be better way to 
get the job done. Any suggestion ? Some sample code will help a lot. You may use the df rounding to 2 digits after the decimal. The main issue is to make the code elegant and improve the visualization better. 
I think code can be more simplier - simply replace 0 to NaN and then call DataFrame.boxplot:
print (df.mask(df == 0))
#alternative solution
#print (df.replace(0,np.nan))
          CHINA    USA
country               
0        119.02    NaN
1        121.20    NaN
3        112.49    NaN
4        113.94    NaN
5        114.67    NaN
6        111.77    NaN
7        117.57    NaN
6648        NaN  420.0
6649        NaN  420.0
6650        NaN  420.0
6651        NaN  420.0
6652        NaN  420.0
6653        NaN  420.0
6654        NaN  500.0
6655        NaN  500.0
6656        NaN  390.0
6657        NaN  450.0
6658        NaN  420.0
6659        NaN  420.0
6660        NaN  450.0
df.mask(df == 0).boxplot()

Another possible solution is use DataFrame.plot.box:
df.mask(df == 0).plot.box()

Box Plots in docs
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With