I have a data frame that has 2 columns.
column1 has random numbers in column2 is a place holding column for what i want column3 to look like
random temp
0.502423373 1
0.687594055 0
0.741883739 0
0.445364032 0
0.50626137 0.5
0.516364981 0
...
I want to fill column3 so it takes the last non-zero number (1 or .5 in this example) and continuously fills the following rows with that value until it hits a row with a different number. then it repeats the process for the entire column.
random temp state
0.502423373 1 1
0.687594055 0 1
0.741883739 0 1
0.445364032 0 1
0.50626137 0.5 0.5
0.516364981 0 0.5
0.807804708 0 0.5
0.247948445 0 0.5
0.46573337 0 0.5
0.103705154 0 0.5
0.079625868 1 1
0.938928944 0 1
0.677713019 0 1
0.112231619 0 1
0.165907178 0 1
0.836195267 0 1
0.387712998 1 1
0.147737077 0 1
0.439281543 0.5 0.5
0.089013503 0 0.5
0.84174743 0 0.5
0.931738707 0 0.5
0.807955172 1 1
thanks for any and all help
You can use the df. loc() function to add a row to the end of a pandas DataFrame: #add row to end of DataFrame df. loc[len(df.
iloc – Pandas Dataframe. iloc is used to retrieve data by specifying its index. In python negative index starts from the end so we can access the last element of the dataframe by specifying its index to -1.
Method 1: Using tail() method DataFrame. tail(n) to get the last n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the end). By default n = 5, it return the last 5 rows if the value of n is not passed to the method.
Perhaps you can make use of na.locf
from the "zoo" package after setting values of "0" to NA
. Assuming your data.frame
is called "mydf":
mydf$state <- mydf$temp
mydf$state[mydf$state == 0] <- NA
library(zoo)
mydf$state <- na.locf(mydf$state)
# random temp state
# 1 0.5024234 1.0 1.0
# 2 0.6875941 0.0 1.0
# 3 0.7418837 0.0 1.0
# 4 0.4453640 0.0 1.0
# 5 0.5062614 0.5 0.5
# 6 0.5163650 0.0 0.5
If there were NA
values in your original data.frame
in the "temp" column, and you wanted to keep them as NA
in the newly generated "state" column too, that's easy to take care of. Just add one more line to reintroduce the NA
values:
mydf$state[is.na(mydf$temp)] <- NA
Inspired by the solution of @Ananda Mahto, this is an adaption of the internal code of na.locf
that works directly with 0
's instead of NA
s. Then you don't need the zoo
package and you don't need to do the preprocessing of changing the values to NA
. Benchmarktests show that this is about 10 times faster than the original version.
locf.0 <- function(x) {
L <- x!=0
idx <- c(0, which(L))[cumsum(L) + 1]
return(x[idx])
}
mydf$state <- locf.0(mydf$temp)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With