Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting grouped data in same plot using Pandas

In Pandas, I am doing:

bp = p_df.groupby('class').plot(kind='kde') 

p_df is a dataframe object.

However, this is producing two plots, one for each class. How do I force one plot with both classes in the same plot?

like image 914
user4522999 Avatar asked Feb 03 '15 06:02

user4522999


People also ask

How do you group similar rows in pandas?

You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.


2 Answers

Version 1:

You can create your axis, and then use the ax keyword of DataFrameGroupBy.plot to add everything to these axes:

import matplotlib.pyplot as plt  p_df = pd.DataFrame({"class": [1,1,2,2,1], "a": [2,3,2,3,2]}) fig, ax = plt.subplots(figsize=(8,6)) bp = p_df.groupby('class').plot(kind='kde', ax=ax) 

This is the result:

plot

Unfortunately, the labeling of the legend does not make too much sense here.

Version 2:

Another way would be to loop through the groups and plot the curves manually:

classes = ["class 1"] * 5 + ["class 2"] * 5 vals = [1,3,5,1,3] + [2,6,7,5,2] p_df = pd.DataFrame({"class": classes, "vals": vals})  fig, ax = plt.subplots(figsize=(8,6)) for label, df in p_df.groupby('class'):     df.vals.plot(kind="kde", ax=ax, label=label) plt.legend() 

This way you can easily control the legend. This is the result:

plot2

like image 169
cel Avatar answered Sep 21 '22 12:09

cel


Another approach would be using seaborn module. This would plot the two density estimates on the same axes without specifying a variable to hold the axes as follows (using some data frame setup from the other answer):

import pandas as pd import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline  # data to create an example data frame classes = ["c1"] * 5 + ["c2"] * 5 vals = [1,3,5,1,3] + [2,6,7,5,2] # the data frame  df = pd.DataFrame({"cls": classes, "indices":idx, "vals": vals})  # this is to plot the kde sns.kdeplot(df.vals[df.cls == "c1"],label='c1'); sns.kdeplot(df.vals[df.cls == "c2"],label='c2');  # beautifying the labels plt.xlabel('value') plt.ylabel('density') plt.show() 

This results in the following image.

Resulting image from the code given above.

like image 45
dagcilibili Avatar answered Sep 20 '22 12:09

dagcilibili