pandas dataframe column based on previous rows

Tags:

I have a below dataframe

         id  action   
         ================
         10   CREATED   
         10   111
         10   222
         10   333
         10   DONE      
         10   222
         10   UPDATED   
         777  CREATED    
         10   333
         10   DONE

I would like to create a new column "check" that would be based on data in previous rows in dataframe:

Find cell in action column = "DONE"
Search for the first CREATED or UPDATED with the same id in previous rows, before DONE. In case its CREATED then put C in case UPDATED put U.

Output:

         id  action   check
         ================
         10   CREATED   
         10   111
         10   222
         10   333
         10   DONE      C
         10   222
         10   UPDATED   
         777  CREATED    
         10   333
         10   DONE      U

I tried to use multiple if conditions but it did not work for me. Can you pls help?

930

asked Jun 12 '20 16:06

johnt

1 Answers

Consider a more sophisticated sample dataframe for illustration:

# print(df)
id  action   
10   CREATED   
10   111
10   222
10   333
10   DONE      
10   222
10   UPDATED   
777  CREATED    
10   333
10   DONE
777  DONE
10   CREATED
10   DONE
11   UPDATED
11   DONE

Use:

transformer = lambda s: s[(s.eq('CREATED') | s.eq('UPDATED')).cumsum().idxmax()]

grouper = (
    lambda g: g.groupby(
        g['action'].eq('DONE').cumsum().shift().fillna(0))['action']
    .transform(transformer)
)

df['check'] = df.groupby('id').apply(grouper).droplevel(0).str[0]
df.loc[df['action'].ne('DONE'), 'check'] = ''

Explanation:

First we group the dataframe on id and apply a grouper function, then for each grouped dataframe we further group this grouped dataframe by the first occurence of DONE in the action column, so essentially we are splitting this grouped dataframe in multiple parts where each part separated from the other by the DONE value in action column. then we use transformer lambda function to transform each of this spllitted dataframes according to the first value (CREATED or UPDATED) that preceds the DONE value in action column.

Result:

# print(df)
     id   action check
0    10  CREATED      
1    10      111      
2    10      222      
3    10      333      
4    10     DONE     C
5    10      222      
6    10  UPDATED      
7   777  CREATED      
8    10      333      
9    10     DONE     U
10  777     DONE     C
11   10  CREATED      
12   10     DONE     C
13   11  UPDATED      
14   11     DONE     U

107

answered Sep 30 '22 20:09

Shubham Sharma

Related questions
                            
                                Is zip_safe only relevant for the egg format?
                            
                                How to use the s3 hook in airflow
                            
                                How to use timedelta with pandas df.query()?
                            
                                Liquibase integration in python project
                            
                                Python Plotly: How to add an image to a 3D scatter plot
                            
                                Docker hyperkit process CPU usage going crazy. How to keep it under control?
                            
                                Dash: how to control graph style via CSS?
                            
                                Python: How to offer a single executable file without showing the code in 2020
                            
                                Cannot open jupyter notebook in VSCode
                            
                                How to find which library prevents updating a package in conda?
                            
                                Problem with KerasRegressor & multiple output
                            
                                Tensorflow graph nodes are exchange
                            
                                Error "Running as root without --no-sandbox is not supported"
                            
                                Fastparquet giving "TypeError: expected str, bytes or os.PathLike object, not _io.BytesIO" while using dataframe.to_parquet()
                            
                                Change alembic logger
                            
                                Unable to Instantiate Python Dataclass (Frozen) inside a Pytest function that uses Fixtures
                            
                                How to deal with name clash collections.Counter and typing.Counter?
                            
                                Ubuntu 20.04 "Temporary failure in name resolution" - recently reinstalled
                            
                                GSDMM Convergence of Clusters (Short Text Clustering)
                            
                                Python logging why outputing twice?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas dataframe column based on previous rows

Tags:

python

pandas

dataframe

if-statement

johnt

People also ask

1 Answers

Shubham Sharma

Recent Activity

Donate For Us