I'm new to R and have been struggling with the following for a while now so I was hoping someone would be able to help me out. The sample data represents stock price returns (each row is a monthly period). The real data set is much bigger and is structured like the input below: Input: <pre class="prettyprint"><code>stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02) stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02) stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03) stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02) df <- cbind(stock1,stock2,stock3,stock4) stock1 stock2 stock3 stock4 [1,] 0.01 0.00 0.00 0.00 [2,] -0.02 0.00 0.00 -0.02 [3,] 0.01 0.02 0.02 0.01 [4,] 0.05 0.04 0.00 0.00 [5,] 0.04 -0.03 -0.01 0.00 [6,] -0.02 0.02 0.03 -0.02 </code></pre> Any zeroes that precedes a non-zero for a given stock represents missing data as opposed to a return of zero for the period. I would like to set these values as NA so the output I would like to achieve is the following: Desired Output: <pre class="prettyprint"><code>stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02) stock2 <- c(NA, NA, 0.02, 0.04, -0.03, 0.02) stock3 <- c(NA, NA, 0.02, 0, -0.01, 0.03) stock4 <- c(NA, -0.02, 0.01, 0, 0, -0.02) df <- cbind(stock1,stock2,stock3,stock4) stock1 stock2 stock3 stock4 [1,] 0.01 NA NA NA [2,] -0.02 NA NA -0.02 [3,] 0.01 0.02 0.02 0.01 [4,] 0.05 0.04 0.00 0.00 [5,] 0.04 -0.03 -0.01 0.00 [6,] -0.02 0.02 0.03 -0.02 </code></pre> I've tried a few things but they only seem to work for a single vector as opposed to a data set with multiple columns. I've tried using lapply to get around this but haven't had any luck so far. The closest I've gotten is shown below. My single vector solution: <pre class="prettyprint"><code>stock1[1:min(which(stock1!=0))-1 <- NA </code></pre> My multiple vector solution which does not work: <pre class="prettyprint"><code>lapply(df,function(x) x[1:min(which(x!=0))-1 <- NA] </code></pre> Would greatly appreciate any guidance! Thanks!

There are three issues. First, writing: <pre class="prettyprint"><code>df <- cbind(stock1,stock2,stock3,stock4) </code></pre> doesn't create a data frame. It creates a matrix. This is an issue when you try to use <code>lapply</code>, which will operate over the columns of a data frame but over the elements of a matrix. Instead, you should write: <pre class="prettyprint"><code>df <- data.frame(stock1,stock2,stock3,stock4) </code></pre> Second, the function you're using in <code>lapply</code> needs to return the modified vector. Otherwise, the return value will be something unexpected (in this case, the assignment will return a single <code>NA</code>, and the <code>lapply</code> will return a data frame of one row of <code>NA</code>s instead of the data frame you want). Third, you need to take care with <code>1:n</code> when <code>n</code> can be zero (i.e., when the first stock quote is non-zero) because <code>1:0</code> gives the sequence <code>c(1,0)</code> instead of an empty sequence. (This is arguably one of R's stupidest features.) Therefore, the following will give you what you want: <pre class="prettyprint"><code>stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02) stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02) stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03) stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02) df <- data.frame(stock1,stock2,stock3,stock4) as.data.frame(lapply(df, function(x) { n <- min(which(x != 0)) - 1 if (n > 0) x[1:n] <- NA x })) </code></pre> The output is as expected: <pre class="prettyprint"><code> stock1 stock2 stock3 stock4 1 0.01 NA NA NA 2 -0.02 NA NA -0.02 3 0.01 0.02 0.02 0.01 4 0.05 0.04 0.00 0.00 5 0.04 -0.03 -0.01 0.00 6 -0.02 0.02 0.03 -0.02 </code></pre> Update: As @Daniel_Fischer notes, there's a clever trick to avoid the <code>1:0</code> problem. You can instead write: <pre class="prettyprint"><code>as.data.frame(lapply(df, function(x) { n <- min(which(x != 0)) - 1 x[0:n] <- NA # use 0:n instead of 1:n x })) </code></pre> This takes advantage of the fact that R ignores zeros in this type of indexing operation, so: <pre class="prettyprint"><code>x[0:0] <- NA # same as x[0] <- NA and does nothing x[0:1] <- NA # same as x[1] <- NA x[0:2] <- NA # same as x[1:2] <- NA, etc. </code></pre>

Replacing zeroes with NA for values preceding non-zero

Tags:

r

I'm new to R and have been struggling with the following for a while now so I was hoping someone would be able to help me out.

The sample data represents stock price returns (each row is a monthly period). The real data set is much bigger and is structured like the input below:

Input:

stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- cbind(stock1,stock2,stock3,stock4)

     stock1 stock2 stock3 stock4
[1,]   0.01   0.00   0.00   0.00
[2,]  -0.02   0.00   0.00  -0.02
[3,]   0.01   0.02   0.02   0.01
[4,]   0.05   0.04   0.00   0.00
[5,]   0.04  -0.03  -0.01   0.00
[6,]  -0.02   0.02   0.03  -0.02

Any zeroes that precedes a non-zero for a given stock represents missing data as opposed to a return of zero for the period. I would like to set these values as NA so the output I would like to achieve is the following:

Desired Output:

stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(NA, NA, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(NA, NA, 0.02, 0, -0.01, 0.03)
stock4 <- c(NA, -0.02, 0.01, 0, 0, -0.02)
df <- cbind(stock1,stock2,stock3,stock4)

     stock1 stock2 stock3 stock4
[1,]   0.01     NA     NA     NA
[2,]  -0.02     NA     NA  -0.02
[3,]   0.01   0.02   0.02   0.01
[4,]   0.05   0.04   0.00   0.00
[5,]   0.04  -0.03  -0.01   0.00
[6,]  -0.02   0.02   0.03  -0.02

I've tried a few things but they only seem to work for a single vector as opposed to a data set with multiple columns. I've tried using lapply to get around this but haven't had any luck so far. The closest I've gotten is shown below.

My single vector solution:

stock1[1:min(which(stock1!=0))-1 <- NA

My multiple vector solution which does not work:

lapply(df,function(x) x[1:min(which(x!=0))-1 <- NA]

Would greatly appreciate any guidance! Thanks!

397

asked Aug 14 '18 05:08

bubs7

3 Answers

There are three issues. First, writing:

df <- cbind(stock1,stock2,stock3,stock4)

doesn't create a data frame. It creates a matrix. This is an issue when you try to use lapply, which will operate over the columns of a data frame but over the elements of a matrix. Instead, you should write:

df <- data.frame(stock1,stock2,stock3,stock4)

Second, the function you're using in lapply needs to return the modified vector. Otherwise, the return value will be something unexpected (in this case, the assignment will return a single NA, and the lapply will return a data frame of one row of NAs instead of the data frame you want).

Third, you need to take care with 1:n when n can be zero (i.e., when the first stock quote is non-zero) because 1:0 gives the sequence c(1,0) instead of an empty sequence. (This is arguably one of R's stupidest features.)

Therefore, the following will give you what you want:

stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- data.frame(stock1,stock2,stock3,stock4)

as.data.frame(lapply(df, function(x) {
    n <- min(which(x != 0)) - 1
    if (n > 0)
        x[1:n] <- NA
    x
}))

The output is as expected:

  stock1 stock2 stock3 stock4
1   0.01     NA     NA     NA
2  -0.02     NA     NA  -0.02
3   0.01   0.02   0.02   0.01
4   0.05   0.04   0.00   0.00
5   0.04  -0.03  -0.01   0.00
6  -0.02   0.02   0.03  -0.02

Update: As @Daniel_Fischer notes, there's a clever trick to avoid the 1:0 problem. You can instead write:

as.data.frame(lapply(df, function(x) {
    n <- min(which(x != 0)) - 1
    x[0:n] <- NA    # use 0:n instead of 1:n
    x
}))

This takes advantage of the fact that R ignores zeros in this type of indexing operation, so:

x[0:0] <- NA    # same as x[0] <- NA and does nothing
x[0:1] <- NA    # same as x[1] <- NA
x[0:2] <- NA    # same as x[1:2] <- NA, etc.

answered Nov 14 '22 23:11

K. A. Buhr

This might be not the most elegant way, but I think it works

changeValues <- function(x){
   place <- min(which(diff(c(0,cumsum(x==0)))==0))-1;
   x[0:place] <- NA
   x
}

apply(df,2,changeValues)

EDIT: Some brief explanation to the function: First I create a vector that increases at each position where is a zero in your column, then I check at which position this vector does not increase (=that means, there are not two zeros next to each other) and then I still take the minimum of that and make sure that these are only leading zeros (so that not values from within the matrix are changed)

answered Nov 14 '22 21:11

Daniel Fischer

stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- data.frame(stock1,stock2,stock3,stock4) #the following function only works if df is actually a data.frame

df[] <- lapply(df, function(x) {ifelse(cumsum(x) == 0 & x == 0, NA, x)})

df

  stock1 stock2 stock3 stock4
1   0.01     NA     NA     NA
2  -0.02     NA     NA  -0.02
3   0.01   0.02   0.02   0.01
4   0.05   0.04   0.00   0.00
5   0.04  -0.03  -0.01   0.00
6  -0.02   0.02   0.03  -0.02

Some explanation: first check for each cell whether the cumulative colSum ánd the current cell are equal to 0. If so, return NA, else the original value. The brackets behind df make sure the lapply function returns a dataframe again that is assigned to df.

Also, if you don't really need df to be a dataframe, this works as well:

df <- cbind(stock1,stock2,stock3,stock4)
apply(df, 2, function(x) {ifelse(cumsum(x) == 0 & x == 0, NA, x)})

answered Nov 14 '22 21:11

Lennyy

Related questions
                            
                                How to draw rainfall runoff graph in R using ggplot?
                            
                                R: Extracting non-duplicated values from vector (not keeping one value for duplicates) [duplicate]
                            
                                Delete rows based on multiple conditions in r [duplicate]
                            
                                Convert nested list elements into data frame and bind the result into one data frame
                            
                                trouble installing and loading rJava on mac El Capitan
                            
                                shiny app with module as a package
                            
                                How to interpret error "elements..... must be named" when sourcing an R6 class?
                            
                                image logo over TOC in Rmarkdown
                            
                                Split a vector into chunks such that sum of each chunk is approximately constant
                            
                                Indent without adding a bullet point or number in RMarkdown
                            
                                Convert Excel numeric to date
                            
                                wrapping long geom_text labels
                            
                                How to correctly output Plotly plots in shiny?
                            
                                Using dplyr summarize with different operations for multiple columns
                            
                                All combinations of letters/numbers under specific conditions
                            
                                r - Convert output from sf::st_within to vector
                            
                                R - ggplot2 time series x-axis to show last day of the month
                            
                                Image output in shiny app
                            
                                Convert an integer to a string in R
                            
                                R Caret Package Error - At least one of the class levels is not a valid R variable name

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With