I have a <code>pandas</code> sorted data frame (based on time) like this: <pre class="prettyprint"><code>from datetime import datetime df = pd.DataFrame({ 'ActivityDateTime' : [datetime(2016,5,13,6,14),datetime(2016,5,13,6,16), datetime(2016,5,13,6,20),datetime(2016,5,13,6,27),datetime(2016,5,13,6,31), datetime(2016,5,13,6,32), datetime(2016,5,13,17,34),datetime(2016,5,13,17,36), datetime(2016,5,13,17,38),datetime(2016,5,13,17,45),datetime(2016,5,13,17,47), datetime(2016,5,16,13,3),datetime(2016,5,16,13,6), datetime(2016,5,16,13,10),datetime(2016,5,16,13,14),datetime(2016,5,16,13,16)], 'Value1' : [0.0,2.0,3.0,4.0,0.0,0.0,0.0,7.0,8.0,4.0,0.0,0.0,3.0,9.0,1.0,0.0], 'Value2' : [0.0,2.0,3.0,4.0,0.0,0.0,0.0,7.0,8.0,4.0,0.0,0.0,3.0,9.0,1.0,0.0] }) </code></pre> Which turns out like this: <pre class="prettyprint"><code>ActivityDateTime Value1 Value2 0 2016-05-13 06:14:00 0.0 0.0 1 2016-05-13 06:16:00 2.0 2.0 2 2016-05-13 06:20:00 3.0 3.0 3 2016-05-13 06:27:00 4.0 4.0 4 2016-05-13 06:31:00 0.0 0.0 5 2016-05-13 06:32:00 0.0 0.0 6 2016-05-13 17:34:00 0.0 0.0 7 2016-05-13 17:36:00 7.0 7.0 8 2016-05-13 17:38:00 8.0 8.0 9 2016-05-13 17:45:00 4.0 4.0 10 2016-05-13 17:47:00 0.0 0.0 11 2016-05-16 13:03:00 0.0 0.0 12 2016-05-16 13:06:00 3.0 3.0 13 2016-05-16 13:10:00 9.0 9.0 14 2016-05-16 13:14:00 1.0 1.0 15 2016-05-16 13:16:00 0.0 0.0 </code></pre> I'd like to aggregate the data (averaging) without a for loop. However, the way I am going to group the observations is not straight forward! Looking at <code>Value1</code>, I want to group them as <code>non-zero</code> values together. For example, indicies <code>1,2,3</code> would be in one group. Incidies <code>7,8,9</code> in one group and another one would be <code>12,13,14</code>. The rows where <code>value1==0</code>, should be avoided and the zeros just act as a separation between groups. Eventually I'd like to get something like this: <pre class="prettyprint"><code>Activity_end Activity_start Value1 Value2 num_observations 0 2016-05-13 06:27:00 2016-05-13 06:16:00 4.50 4.50 3 1 2016-05-13 17:45:00 2016-05-13 17:36:00 6.33 6.33 3 2 2016-05-16 13:14:00 2016-05-16 13:06:00 4.33 4.33 3 </code></pre> Currently, I am thinking that I should somehow assign numbers <code>1</code>,<code>2</code> and <code>3</code> to a new column and then aggregate them based on that. I am not sure how to make that column without a for loop though! Please notice that <code>Value1</code> and <code>Value2</code> are not necessarily the same.

One way of doing it involves creating some temporary columns <pre class="prettyprint"><code># First create a new series, which is true whenever the value changes from a zero value to a non-zero value (which will be at the start of each group) nonzero = (df['Value1'] > 0) & (df['Value1'].shift(1) == 0) # Take a cumulative sum. This means each group will have it's own number. df['group'] = df['nonzero'].cumsum() # Group by the group column gb = df[df['Value1'] > 0].groupby('group') </code></pre> You can then take aggregates of this group using the aggregate functions http://pandas.pydata.org/pandas-docs/stable/groupby.html For what you're specifically wanting to get as an output, have a look at this answer too: Python Pandas: Multiple aggregations of the same column <pre class="prettyprint"><code>df2 = gb.agg({ 'ActivityDateTime': ['first', 'last'], 'Value1': 'mean', 'Value2': 'mean'}) </code></pre>

Aggregation on Pandas data frame for selected rows

Tags:

I have a pandas sorted data frame (based on time) like this:

from datetime import datetime
df = pd.DataFrame({ 'ActivityDateTime' : [datetime(2016,5,13,6,14),datetime(2016,5,13,6,16),
                                 datetime(2016,5,13,6,20),datetime(2016,5,13,6,27),datetime(2016,5,13,6,31),
                                 datetime(2016,5,13,6,32),
                                datetime(2016,5,13,17,34),datetime(2016,5,13,17,36),
                                 datetime(2016,5,13,17,38),datetime(2016,5,13,17,45),datetime(2016,5,13,17,47),
                                datetime(2016,5,16,13,3),datetime(2016,5,16,13,6),
                                 datetime(2016,5,16,13,10),datetime(2016,5,16,13,14),datetime(2016,5,16,13,16)],
              'Value1' : [0.0,2.0,3.0,4.0,0.0,0.0,0.0,7.0,8.0,4.0,0.0,0.0,3.0,9.0,1.0,0.0],
               'Value2' : [0.0,2.0,3.0,4.0,0.0,0.0,0.0,7.0,8.0,4.0,0.0,0.0,3.0,9.0,1.0,0.0]
        })

Which turns out like this:

ActivityDateTime    Value1  Value2
0   2016-05-13 06:14:00 0.0 0.0
1   2016-05-13 06:16:00 2.0 2.0
2   2016-05-13 06:20:00 3.0 3.0
3   2016-05-13 06:27:00 4.0 4.0
4   2016-05-13 06:31:00 0.0 0.0
5   2016-05-13 06:32:00 0.0 0.0
6   2016-05-13 17:34:00 0.0 0.0
7   2016-05-13 17:36:00 7.0 7.0
8   2016-05-13 17:38:00 8.0 8.0
9   2016-05-13 17:45:00 4.0 4.0
10  2016-05-13 17:47:00 0.0 0.0
11  2016-05-16 13:03:00 0.0 0.0
12  2016-05-16 13:06:00 3.0 3.0
13  2016-05-16 13:10:00 9.0 9.0
14  2016-05-16 13:14:00 1.0 1.0
15  2016-05-16 13:16:00 0.0 0.0

I'd like to aggregate the data (averaging) without a for loop. However, the way I am going to group the observations is not straight forward! Looking at Value1, I want to group them as non-zero values together. For example, indicies 1,2,3 would be in one group. Incidies 7,8,9 in one group and another one would be 12,13,14. The rows where value1==0, should be avoided and the zeros just act as a separation between groups. Eventually I'd like to get something like this:

Activity_end    Activity_start  Value1  Value2  num_observations
0   2016-05-13 06:27:00 2016-05-13 06:16:00 4.50    4.50    3
1   2016-05-13 17:45:00 2016-05-13 17:36:00 6.33    6.33    3
2   2016-05-16 13:14:00 2016-05-16 13:06:00 4.33    4.33    3

Currently, I am thinking that I should somehow assign numbers 1,2 and 3 to a new column and then aggregate them based on that. I am not sure how to make that column without a for loop though! Please notice that Value1 and Value2 are not necessarily the same.

914

asked May 13 '16 23:05

ahoosh

1 Answers

One way of doing it involves creating some temporary columns

# First create a new series, which is true whenever the value changes from a zero value to a non-zero value (which will be at the start of each group)
nonzero = (df['Value1'] > 0) & (df['Value1'].shift(1) == 0)
# Take a cumulative sum. This means each group will have it's own number.
df['group'] = df['nonzero'].cumsum()
# Group by the group column
gb = df[df['Value1'] > 0].groupby('group')

You can then take aggregates of this group using the aggregate functions http://pandas.pydata.org/pandas-docs/stable/groupby.html

For what you're specifically wanting to get as an output, have a look at this answer too: Python Pandas: Multiple aggregations of the same column

df2 = gb.agg({
    'ActivityDateTime': ['first', 'last'],
    'Value1': 'mean',
    'Value2': 'mean'})

192

answered Oct 06 '22 00:10

Jezzamon

Related questions
                            
                                Mocha keeps bombing due to absolute paths
                            
                                How to delete files older than X hours or minutes?
                            
                                C - Split TCHAR
                            
                                Why is Chrome's JS Console returning a DOM element rather than a jQuery Object? [duplicate]
                            
                                How to implement and insert value SQL specialization/generalization
                            
                                Generic Repository, CreateObjectSet<T>() Method
                            
                                ImportError: No module named Scrapy; even if Scrapy was successfully installed
                            
                                Typescript importing exported class emits require(...) which produces browser errors
                            
                                Google Dataflow late data
                            
                                Decreasing android studio 2.1 build time
                            
                                Receiving response body in Retrofit2 but onResponse is not getting called
                            
                                How should i route POST and GET in Mvc 5

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With