Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filling data frame with previous row value

I have a data frame that has 2 columns.

column1 has random numbers in column2 is a place holding column for what i want column3 to look like

  random    temp
0.502423373 1
0.687594055 0
0.741883739 0
0.445364032 0
0.50626137  0.5
0.516364981 0
...

I want to fill column3 so it takes the last non-zero number (1 or .5 in this example) and continuously fills the following rows with that value until it hits a row with a different number. then it repeats the process for the entire column.

random     temp state
0.502423373 1   1
0.687594055 0   1
0.741883739 0   1
0.445364032 0   1
0.50626137  0.5 0.5
0.516364981 0   0.5
0.807804708 0   0.5
0.247948445 0   0.5
0.46573337  0   0.5
0.103705154 0   0.5
0.079625868 1   1
0.938928944 0   1
0.677713019 0   1
0.112231619 0   1
0.165907178 0   1
0.836195267 0   1
0.387712998 1   1
0.147737077 0   1
0.439281543 0.5 0.5
0.089013503 0   0.5
0.84174743  0   0.5
0.931738707 0   0.5
0.807955172 1   1

thanks for any and all help

like image 276
user2813055 Avatar asked Dec 06 '13 04:12

user2813055


People also ask

How do you add a last row in a DataFrame?

You can use the df. loc() function to add a row to the end of a pandas DataFrame: #add row to end of DataFrame df. loc[len(df.

How do you find the last value in a data frame?

iloc – Pandas Dataframe. iloc is used to retrieve data by specifying its index. In python negative index starts from the end so we can access the last element of the dataframe by specifying its index to -1.

How do I display the last two rows in a data frame?

Method 1: Using tail() method DataFrame. tail(n) to get the last n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the end). By default n = 5, it return the last 5 rows if the value of n is not passed to the method.


2 Answers

Perhaps you can make use of na.locf from the "zoo" package after setting values of "0" to NA. Assuming your data.frame is called "mydf":

mydf$state <- mydf$temp
mydf$state[mydf$state == 0] <- NA

library(zoo)
mydf$state <- na.locf(mydf$state)
#      random temp state
# 1 0.5024234  1.0   1.0
# 2 0.6875941  0.0   1.0
# 3 0.7418837  0.0   1.0
# 4 0.4453640  0.0   1.0
# 5 0.5062614  0.5   0.5
# 6 0.5163650  0.0   0.5

If there were NA values in your original data.frame in the "temp" column, and you wanted to keep them as NA in the newly generated "state" column too, that's easy to take care of. Just add one more line to reintroduce the NA values:

mydf$state[is.na(mydf$temp)] <- NA
like image 61
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 20 '22 05:10

A5C1D2H2I1M1N2O1R2T1


Inspired by the solution of @Ananda Mahto, this is an adaption of the internal code of na.locf that works directly with 0's instead of NAs. Then you don't need the zoo package and you don't need to do the preprocessing of changing the values to NA. Benchmarktests show that this is about 10 times faster than the original version.

locf.0 <- function(x) {
  L <- x!=0
  idx <- c(0, which(L))[cumsum(L) + 1]
  return(x[idx])
} 
mydf$state <- locf.0(mydf$temp)
like image 45
shadow Avatar answered Oct 20 '22 04:10

shadow