Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill na values by adding x to previous row pandas

I have a data frame with a column named SAM with following data

SAM
3
5
9
Nan
Nan
24
40
Nan
57

Now I want to Insert 12, 15 and 43 respectively in the Nan values (because 9+3=12, 12+3=15, and 40+3=43). In other words, fill any Nan row by adding 3 to previous row (which can also be Nan).

I know this can be done by iterating through a for loop. But can we do it in a vectorized manner? Like some modified version of ffill (which could have been used here if we didn't have consecutive NaNs) in pandas.fillna().

like image 859
Vijay P R Avatar asked Dec 14 '16 15:12

Vijay P R


People also ask

How to fill column with Na/nan/none with 0 in pandas Dataframe?

pandas.DataFrame.fillna () method is used to fill column (one or multiple columns) contains NA/NaN/None with 0, empty, blank or any specified values e.t.c. NaN is considered a missing value. When you dealing with machine learning handling missing values is very important, not handling these will result in a side effect with an incorrect result.

How to fill a specified value on multiple Dataframe columns in pandas?

Use pandas fillna () method to fill a specified value on multiple DataFrame columns, the below example update columns Discount and Fee with 0 for NaN values. Now, let’s see how to fill different value for each column.

What is fillna() in pandas?

It is used to fill NaN values with specified values (0, blank, e.t.c). If you want to consider infinity ( inf and -inf ) to be “NA” in computations, you can set pandas.options.mode.use_inf_as_na = True. Besides NaN, pandas None also considers as missing. 1. Quick Examples of pandas fillna ()

How do you fill a place in pandas?

Pandas has different methods like bfill, backfill or ffill which fills the place with value in the Forward index or Previous/Back respectively. axis: axis takes int or string value for rows/columns. inplace: It is a boolean which makes the changes in data frame itself if True.


1 Answers

You can try this vectorized approach:

nul = df['SAM'].isnull()
nul.groupby((nul.diff() == 1).cumsum()).cumsum()*3 + df['SAM'].ffill()

#0     3.0
#1     5.0
#2     9.0
#3    12.0
#4    15.0
#5    24.0
#6    40.0
#7    43.0
#8    57.0
#Name: SAM, dtype: float64
  1. Divide the missing values in the series into chunks and add 3,6,9 etc to the missing value positions depending on the length of each chunk;
  2. Add the forward filled values from SAM column to the result.
like image 150
Psidom Avatar answered Sep 30 '22 00:09

Psidom