<pre class="prettyprint"><code>F_Date B_Date col is_B 01/09/2019 02/08/2019 2200 1 01/09/2019 03/08/2019 672 1 02/09/2019 03/08/2019 1828 1 01/09/2019 04/08/2019 503 0 02/09/2019 04/08/2019 829 1 03/09/2019 04/08/2019 1367 0 02/09/2019 05/08/2019 559 1 03/09/2019 05/08/2019 922 1 04/09/2019 05/08/2019 1519 0 01/09/2019 06/08/2019 376 1 </code></pre> I want to generate a column <code>c_a</code> such that for first entry of flight_date initially the value is <code>25000</code> and decreases based on col value. For example : Expected Output : <pre class="prettyprint"><code>F_Date B_Date col is_B c_a 01/09/2019 02/08/2019 2200 1 25000 01/09/2019 03/08/2019 672 1 25000 - 2200 02/09/2019 03/08/2019 1828 1 25000 01/09/2019 04/08/2019 503 0 25000 - 2200 - 672 02/09/2019 04/08/2019 829 1 25000 - 1828 03/09/2019 04/08/2019 1367 0 25000 02/09/2019 05/08/2019 559 1 25000 - 1828 - 829 03/09/2019 05/08/2019 922 1 25000 (since last value had is_B as 0) 04/09/2019 05/08/2019 1519 0 25000 01/09/2019 06/08/2019 376 1 25000 - 2200 - 672 (Since last appearance had is_B as 0) </code></pre> Can anyone identify a pandas way to achieve the same?

I think, I have found a quite concise solution: <pre class="prettyprint"><code>df['c_a'] = df.groupby('F_Date').apply(lambda grp: 25000 - grp.col.where(grp.is_B.eq(1), 0).shift(fill_value=0) .cumsum()).reset_index(level=0, drop=True) </code></pre> The result is: <pre class="prettyprint"><code> F_Date B_Date col is_B c_a 0 01/09/2019 02/08/2019 2200 1 25000 1 01/09/2019 03/08/2019 672 1 22800 2 02/09/2019 03/08/2019 1828 1 25000 3 01/09/2019 04/08/2019 503 0 22128 4 02/09/2019 04/08/2019 829 1 23172 5 03/09/2019 04/08/2019 1367 0 25000 6 02/09/2019 05/08/2019 559 1 22343 7 03/09/2019 05/08/2019 922 1 25000 8 04/09/2019 05/08/2019 1519 0 25000 9 01/09/2019 06/08/2019 376 1 22128 </code></pre> The idea, with examples based on group F_Date == '01/09/2019': <ol> <li> <code>grp.col.where(grp.is_B.eq(1), 0)</code> - the value to subtract from the next row in group: <pre class="prettyprint"><code>0 2200 1 672 3 0 9 376 </code></pre> </li> <li> <code>.shift(fill_value=0)</code> - the value to subtract from the current row in group: <pre class="prettyprint"><code>0 0 1 2200 3 672 9 0 </code></pre> </li> <li> <code>.cumsum()</code> - cumulated values to subtract: <pre class="prettyprint"><code>0 0 1 2200 3 2872 9 2872 </code></pre> </li> <li> <code>25000 - ...</code> - the target value: <pre class="prettyprint"><code>0 25000 1 22800 3 22128 9 22128 </code></pre> </li> </ol>

Generate a column based on a constraint in pandas

Tags:

python

pandas

constraints

numpy

F_Date      B_Date      col   is_B
01/09/2019  02/08/2019  2200    1
01/09/2019  03/08/2019  672     1
02/09/2019  03/08/2019  1828    1
01/09/2019  04/08/2019  503     0
02/09/2019  04/08/2019  829     1
03/09/2019  04/08/2019  1367    0
02/09/2019  05/08/2019  559     1
03/09/2019  05/08/2019  922     1
04/09/2019  05/08/2019  1519    0
01/09/2019  06/08/2019  376     1

I want to generate a column c_a such that for first entry of flight_date initially the value is 25000 and decreases based on col value. For example :

Expected Output :

F_Date      B_Date      col   is_B   c_a
01/09/2019  02/08/2019  2200    1    25000
01/09/2019  03/08/2019  672     1    25000 - 2200
02/09/2019  03/08/2019  1828    1    25000
01/09/2019  04/08/2019  503     0    25000 - 2200 - 672
02/09/2019  04/08/2019  829     1    25000 - 1828
03/09/2019  04/08/2019  1367    0    25000
02/09/2019  05/08/2019  559     1    25000 - 1828 - 829
03/09/2019  05/08/2019  922     1    25000 (since last value had is_B as 0)
04/09/2019  05/08/2019  1519    0    25000
01/09/2019  06/08/2019  376     1    25000 - 2200 - 672 (Since last appearance had is_B as 0)

Can anyone identify a pandas way to achieve the same?

318

asked Nov 02 '19 04:11

vp7

2 Answers

I think, I have found a quite concise solution:

df['c_a'] = df.groupby('F_Date').apply(lambda grp:
    25000 - grp.col.where(grp.is_B.eq(1), 0).shift(fill_value=0)
    .cumsum()).reset_index(level=0, drop=True)

The result is:

       F_Date      B_Date   col  is_B    c_a
0  01/09/2019  02/08/2019  2200     1  25000
1  01/09/2019  03/08/2019   672     1  22800
2  02/09/2019  03/08/2019  1828     1  25000
3  01/09/2019  04/08/2019   503     0  22128
4  02/09/2019  04/08/2019   829     1  23172
5  03/09/2019  04/08/2019  1367     0  25000
6  02/09/2019  05/08/2019   559     1  22343
7  03/09/2019  05/08/2019   922     1  25000
8  04/09/2019  05/08/2019  1519     0  25000
9  01/09/2019  06/08/2019   376     1  22128

The idea, with examples based on group F_Date == '01/09/2019':

grp.col.where(grp.is_B.eq(1), 0) - the value to subtract from the next row in group:
```
0    2200
1     672
3       0
9     376
```
.shift(fill_value=0) - the value to subtract from the current row in group:
```
0       0
1    2200
3     672
9       0
```

.cumsum() - cumulated values to subtract:

25000 - ... - the target value:

answered Sep 17 '22 17:09

Valdi_Bo

Nice pandas game :)

import pandas as pd
df = pd.DataFrame({'F_Date': [pd.to_datetime(_, format='%d/%m/%Y') for _ in
                              ['01/09/2019', '01/09/2019', '02/09/2019', '01/09/2019', '02/09/2019',
                               '03/09/2019', '02/09/2019', '03/09/2019', '04/09/2019', '01/09/2019']],
                   'B_Date': [pd.to_datetime(_, format='%d/%m/%Y') for _ in
                              ['02/08/2019', '03/08/2019', '03/08/2019', '04/08/2019', '04/08/2019',
                               '04/08/2019', '05/08/2019', '05/08/2019','05/08/2019', '06/08/2019']],
                   'col': [2200, 672, 1828, 503, 829, 1367, 559, 922, 1519, 376],
                   'is_B': [1, 1, 1, 0, 1, 0, 1, 1, 0, 1]
                   })

Let's go through it step by step:

# sort in the order that fits the semantics of your calculations
df.sort_values(['F_Date', 'B_Date'], inplace=True)

# initialize 'c_a' to 25000 if a new F_Date starts
df.loc[df['F_Date'].diff(1) != pd.Timedelta(0), 'c_a'] = 25000

# Step downwards from every 25000 and substract shifted 'col'
# if shifted 'is_B' == 1, otherwise replicate shifted 'c_a' to the next line
while pd.isna(df.c_a).any():
    df.c_a.where(
        pd.notna(df.c_a),   # set every not-NaN value to ...
        df.c_a.shift(1).where(       # ...the previous / shifted c_a...
            df.is_B.shift(1) == 0,   # ... if previous / shifted is_B == 0
            df.c_a.shift(1) - df.col.shift(1)   # ... otherwise substract shifted 'col'
        ), inplace=True
    )

# restore original order
df.sort_index(inplace=True)

This is the result I get

      F_Date     B_Date   col  is_B      c_a
0 2019-09-01 2019-08-02  2200     1  25000.0
1 2019-09-01 2019-08-03   672     1  22800.0
2 2019-09-02 2019-08-03  1828     1  25000.0
3 2019-09-01 2019-08-04   503     0  22128.0
4 2019-09-02 2019-08-04   829     1  23172.0
5 2019-09-03 2019-08-04  1367     0  25000.0
6 2019-09-02 2019-08-05   559     1  22343.0
7 2019-09-03 2019-08-05   922     1  25000.0
8 2019-09-04 2019-08-05  1519     0  25000.0
9 2019-09-01 2019-08-06   376     1  22128.0

answered Sep 17 '22 17:09

ascripter

Related questions
                            
                                Pandas groupby, resample, etc for subclassed DataFrame
                            
                                creating class properties dynamically
                            
                                New/override SQLAlchemy operator compiler output
                            
                                Why is the python client not receiving SSE events?
                            
                                Using simple averaging for reinforcment learning
                            
                                How to pass an intermediate amount of data to a subprocess?
                            
                                Keras functional API and TensorFlow Hub
                            
                                Plotly Chloropleth combined with ScatterGeo
                            
                                Can't import tensorflow.keras in VS Code
                            
                                Google Cloud Platform - Deploy a Cloud Function that starts a webdriver
                            
                                write spark dataframe as array of json (pyspark)
                            
                                filtering a Pandas DataFrame using dictionary
                            
                                How do I install hunspell on windows10?
                            
                                Finding the proper Python type hint, for instance, the signature of the built-in function map()
                            
                                Why am I getting "An error ocurred while starting the kernel" in Spyder while running Python?
                            
                                Python Setuptools and PBR - how to create a package release using the git tag as the version?
                            
                                Delete row/column from Excel with xlsxwriter
                            
                                Bert Embedding Layer raises `Type Error: unsupported operand type(s) for +: 'None Type' and 'int'` with BiLSTM
                            
                                How to build TensorFlow lite with select TensorFlow ops for x86_64 systems?
                            
                                How to extract data from a Tweepy object into a pandas dataframe?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With