Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting results of Pandas GroupBy

I'm starting to learn Pandas and am trying to find the most Pythonic (or panda-thonic?) ways to do certain tasks.

Suppose we have a DataFrame with columns A, B, and C.

  • Column A contains boolean values: each row's A value is either true or false.
  • Column B has some important values we want to plot.

What we want to discover is the subtle distinctions between B values for rows that have A set to false, vs. B values for rows that have A is true.

In other words, how can I group by the value of column A (either true or false), then plot the values of column B for both groups on the same graph? The two datasets should be colored differently to be able to distinguish the points.


Next, let's add another feature to this program: before graphing, we want to compute another value for each row and store it in column D. This value is the mean of all data stored in B for the entire five minutes before a record - but we only include rows that have the same boolean value stored in A.

In other words, if I have a row where A=True and time=t, I want to compute a value for column D that is the mean of B for all records from time t-5 to t that have the same A=True.

In this case, how can we execute the groupby on values of A, then apply this computation to each individual group, and finally plot the D values for the two groups?

like image 648
Maxim Zaslavsky Avatar asked Mar 17 '13 20:03

Maxim Zaslavsky


People also ask

What is the result of Groupby in pandas?

Group DataFrame using a mapper or by a Series of columns. A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

What does Groupby size () return?

1) Using pandas groupby size() method The most simple method for pandas groupby count is by using the in-built pandas method named size(). It returns a pandas series that possess the total number of row count for each group.

Does Groupby return a DataFrame or series?

So a groupby() operation can downcast to a Series, or if given a Series as input, can upcast to dataframe. For your first dataframe, you run unequal groupings (or unequal index lengths) coercing a series return which in the "combine" processing does not adequately yield a data frame.


1 Answers

I think @herrfz hit all the high points. I'll just flesh out the details:

import pandas as pd import numpy as np import matplotlib.pyplot as plt  sin = np.sin cos = np.cos pi = np.pi N = 100  x = np.linspace(0, pi, N) a = sin(x) b = cos(x)  df = pd.DataFrame({     'A': [True]*N + [False]*N,     'B': np.hstack((a,b))     })  for key, grp in df.groupby(['A']):     plt.plot(grp['B'], label=key)     grp['D'] = pd.rolling_mean(grp['B'], window=5)         plt.plot(grp['D'], label='rolling ({k})'.format(k=key)) plt.legend(loc='best')     plt.show() 

enter image description here

like image 118
unutbu Avatar answered Oct 03 '22 12:10

unutbu