Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating sub data frame based on a value in an column

I have following data frame in pandas. Now I want to generate sub data frame if I see a value in Activity column. So for example, I want to have data frame with all the data with Name A IF Activity column as value 3 or 5.

Name    Date           Activity
A       01-02-2015       1
A       01-03-2015       2
A       01-04-2015       3
A       01-04-2015       1
B       01-02-2015       1
B       01-02-2015       2
B       01-03-2015       1
B       01-04-2015       5
C       01-31-2015       1
C       01-31-2015       2
C       01-31-2015       2

So for the above data, I want to get df_A as

Name    Date           Activity
A       01-02-2015       1
A       01-03-2015       2
A       01-04-2015       3
A       01-04-2015       1

df_B as

B       01-02-2015       1
B       01-02-2015       2
B       01-03-2015       1
B       01-04-2015       5

Since Name C does not have 3 or 5 in the column Activity, I do not want to get this data frame.

Also, the names in the data frame can vary with each input file.

Once I have these data frame separated, I want to plot a time series.

like image 499
1974sb Avatar asked May 24 '26 08:05

1974sb


1 Answers

You can groupby dataframe by column Name, apply custom function f and then select dataframes df_A and df_B:

print df
   Name       Date  Activity
0     A 2015-01-02         1
1     A 2015-01-03         2
2     A 2015-01-04         3
3     A 2015-01-04         1
4     B 2015-01-02         1
5     B 2015-01-02         2
6     B 2015-01-03         1
7     B 2015-01-04         5
8     C 2015-01-31         1
9     C 2015-01-31         2
10    C 2015-01-31         2


def f(df):
    if ((df['Activity'] == 3) | (df['Activity'] == 5)).any():
        return df

g = df.groupby('Name').apply(f).reset_index(drop=True)
df_A = g.loc[g.Name == 'A']
print df_A
  Name       Date  Activity
0    A 2015-01-02         1
1    A 2015-01-03         2
2    A 2015-01-04         3
3    A 2015-01-04         1

df_B = g.loc[g.Name == 'B']
print df_B
  Name       Date  Activity
4    B 2015-01-02         1
5    B 2015-01-02         2
6    B 2015-01-03         1
7    B 2015-01-04         5

df_A.plot()
df_B.plot()

In the end you can use plot - more info

EDIT:

If you want create dataframes dynamically, use can find all unique values of column Name by drop_duplicates:

for name in g.Name.drop_duplicates():
    print g.loc[g.Name == name]

      Name       Date  Activity
0    A 2015-01-02         1
1    A 2015-01-03         2
2    A 2015-01-04         3
3    A 2015-01-04         1
  Name       Date  Activity
4    B 2015-01-02         1
5    B 2015-01-02         2
6    B 2015-01-03         1
7    B 2015-01-04         5
like image 71
jezrael Avatar answered May 26 '26 21:05

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!