How to populate rows of pandas dataframe column based with previous row based on a multiple conditions?

Tags:

Disclaimer: This might be possible duplicate but I cannot find the exact solution. Please feel free to mark this question as duplicate and provide link to duplicate question in comments.

I am still learning python dataframe operations and this possibly has a very simple solution which I am not able to figure out.

I have a python dataframe with a single columns. Now I want to change value of each row to value of previous row if certain conditions are satisfied. I have created a loop solution to implement this but I was hoping for a more efficient solution.

Creation of initial data:

import numpy as np
import pandas as pd

data = np.random.randint(5,30,size=20)
df = pd.DataFrame(data, columns=['random_numbers'])

print(df)

    random_numbers
0                6
1               24
2               29
3               18
4               22
5               17
6               12
7                7
8                6
9               27
10              29
11              13
12              23
13               6
14              25
15              24
16              16
17              15
18              25
19              19

Now lets assume two condition are 1) value less than 10 and 2) value more than 20. In any of these cases, set row value to previous row value. This has been implement in loop format as follows:

for index,row in df.iterrows():
    if index == 0:
        continue;
    if(row.random_numbers<10):
        df.loc[index,'random_numbers']=df.loc[index-1,'random_numbers']
    if(row.random_numbers>20):
        df.loc[index,'random_numbers']=df.loc[index-1,'random_numbers']

    random_numbers
0                6
1                6
2                6
3               18
4               18
5               17
6               12
7               12
8               12
9               12
10              12
11              13
12              13
13              13
14              13
15              13
16              16
17              15
18              15
19              19

Please suggest a more efficient way to implement this logic as I am using large number of rows.

826

asked May 07 '21 04:05

lonstud

2 Answers

You can replace the values less than 10 and values more than 20 with NaN then use pandas.DataFrame.ffill() to fill nan with previous row value.

mask = (df['random_numbers'] < 10) | (df['random_numbers'] > 20)

# Since you escape with `if index == 0:`
mask[df.index[0]] = False

df.loc[mask, 'random_numbers'] = np.nan

df['random_numbers'].ffill(inplace=True)

# Original

    random_numbers
0                7
1               28
2                8
3               14
4               12
5               20
6               21
7               11
8               16
9               27
10              19
11              23
12              18
13               5
14               6
15              11
16               6
17               8
18              17
19               8

# After replaced

    random_numbers
0              7.0
1              7.0
2              7.0
3             14.0
4             12.0
5             20.0
6             20.0
7             11.0
8             16.0
9             16.0
10            19.0
11            19.0
12            18.0
13            18.0
14            18.0
15            11.0
16            11.0
17            11.0
18            17.0
19            17.0

117

answered Nov 02 '22 19:11

Ynjxsjmh

We can also do it in a simpler way by using .mask() together with .ffill() and slicing on [1:] as follows:

df['random_numbers'][1:] = df['random_numbers'][1:].mask((df['random_numbers'] < 10) | (df['random_numbers'] > 20))

df['random_numbers'] = df['random_numbers'].ffill(downcast='infer')

.mask() tests for the condition and replace with NaN when the condition is true (default to replace with NaN if the parameter other= is not supplied). Retains the original values for rows with condition not met.

Note that the resulting numbers are maintained as integer instead of transformed unexpectedly to float type by supplying the downcast='infer' in the call to .ffill().

We use [1:] on the first line to ensure the data on row 0 is untouched without transformation.

# Original data:  (reusing your sample data)

    random_numbers
0                6
1               24
2               29
3               18
4               22
5               17
6               12
7                7
8                6
9               27
10              29
11              13
12              23
13               6
14              25
15              24
16              16
17              15
18              25
19              19


# After transposition:

    random_numbers
0                6
1                6
2                6
3               18
4               18
5               17
6               12
7               12
8               12
9               12
10              12
11              13
12              13
13              13
14              13
15              13
16              16
17              15
18              15
19              19

answered Nov 02 '22 21:11

SeaBean

Related questions
                            
                                How to extract info within a #shadow-root (open) using Selenium Python?
                            
                                Copying a section of a string from one column and putting it into a new pandas column
                            
                                Why is my confusion matrix returning only one number?
                            
                                How to understand creating leaf tensors in PyTorch?
                            
                                Can this code to find the neighborhood of a string be sped up?
                            
                                Pyspark filter dataframe if column does not contain string
                            
                                Unable to code for non-squares integers in Python
                            
                                How to get next available object or primary key from database in django
                            
                                how to convert HuggingFace's Seq2seq models to onnx format
                            
                                python-requests how to send cipher name/http2
                            
                                Django get min and max value from PostgreSQL specific ArrayField holding IntegerField(s)
                            
                                How to raise every element of a vector to the power of every element of another vector?
                            
                                Cannot install pyaudio in google colab
                            
                                How to order an array and count it in Python?
                            
                                Software based on Python 3.9 is not working on Windows 7
                            
                                filter class/subfolder with pytorch ImageFolder
                            
                                Use lazy % formatting in logging functions pylint error message
                            
                                Numpy matrix multiplication but instead of multiplying it XOR's elements
                            
                                Julia symbolic and numeric performance vs Python
                            
                                Apply a function to each cell of a pandas dataframe using information from a particular column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to populate rows of pandas dataframe column based with previous row based on a multiple conditions?

Tags:

python

pandas

dataframe

lonstud

People also ask

2 Answers

Ynjxsjmh

SeaBean

Recent Activity

Donate For Us