I am still new to R and need help. I want to change the NA value in variables x1,x2,x3 to 0 based on the value of count. Count specifies the number of observations, and the x1,x2,x3 stand for the visit to the site (or replication). The value in each 'X' variable is the number of species found. However, not all sites were visited 3 times. The variable count is telling us how many times the site was actually visited. I want to identify the actual NA and real 0 (which means no species found). I want to change the NA into 0 if the site is actually visited and keep it NA if the site is not visited. For example from the dummy data, 'zhask' site is visited 2 times, then the NA in x1 of zhask needs to be replaced with 0. This is the dummy data: <pre class="prettyprint"><code> site x1 x2 x3 count 1 miya 1 2 1 3 2 zhask NA 1 NA 2 3 balmond 3 NA 2 3 4 layla NA 1 NA 2 5 angela NA 3 NA 2 </code></pre> So, it the table need to be changed into: <pre class="prettyprint"><code> site x1 x2 x3 count 1 miya 1 2 1 3 2 zhask 0 1 NA 2 3 balmond 3 0 2 3 4 layla 0 1 NA 2 5 angela 0 3 NA 2 </code></pre> I've tried many things and try to make my own function, however, it is not working: <pre class="prettyprint"><code>for(i in 1:nrow(df)) { if( is.na(df$x1[i]) && (i < df$count[i])) {df$x1[i]=0} else {df$x1[i]=df$x1[i]} } </code></pre> this is the script for the dummy dataframe: <pre class="prettyprint"><code>x1= c(1,NA,3, NA, NA) x2= c(2,1, NA, 1, 3) x3 = c(1, NA, 2, NA, NA) count=c(3,2,3,2,2) site=c("miya", "zhask", "balmond", "layla", "angela") df=data.frame(site,x1,x2,x3,count) </code></pre> Any help will be very much appreciated!

One way to be to apply a function over all of your count columns. Here's a way to do that. <pre class="prettyprint"><code>cols <- c("x1", "x2", "x3") df[, cols] <- mapply(function(col, idx, count) { ifelse(idx <=count & is.na(col), 0, col) }, df[,cols], seq_along(cols), MoreArgs=list(count=df$count)) # site x1 x2 x3 count # 1 miya 1 2 1 3 # 2 zhask 0 1 NA 2 # 3 balmond 3 0 2 3 # 4 layla 0 1 NA 2 # 5 angela 0 3 NA 2 </code></pre> We use <code>mapply</code> to iterate over the columns and the index of the column. We also pass in the <code>count</code> value each time (since it's the same for all columns, it goes in the <code>MoreArgs=</code> parameter). This <code>mapply</code> will return a list and we can use that to replace the columns with the updated values. If you wanted to use <code>dplyr</code>, that might look more like <pre class="prettyprint"><code>library(dplyr) cols <- c("x1"=1, "x2"=2, "x3"=3) df %>% mutate(across(starts_with("x"), ~if_else(cols[cur_column()]<count & is.na(.x), 0, .x))) </code></pre> I used the <code>cols</code> vector to get the index of the column which doesn't seem to be otherwise available when using <code>across()</code>. But a more "tidy" way to tackle this problem would be to pivot your data first to a "tidy" format. Then you can clean the data more easily and pivot back if necessary <pre class="prettyprint"><code>library(dplyr) library(tidyr) df %>% pivot_longer(cols=starts_with("x")) %>% mutate(index=readr::parse_number(name)) %>% mutate(value=if_else(index < count & is.na(value), 0, value)) %>% select(-index) %>% pivot_wider(names_from=name, values_from=value) # site count x1 x2 x3 # <chr> <dbl> <dbl> <dbl> <dbl> # 1 miya 3 1 2 1 # 2 zhask 2 0 1 NA # 3 balmond 3 3 0 2 # 4 layla 2 0 1 NA # 5 angela 2 0 3 NA </code></pre>

Via some indexing of the columns: <pre class="prettyprint"><code>vars <- c("x1","x2","x3") df[vars][is.na(df[vars]) & (col(df[vars]) <= df$count)] <- 0 # site x1 x2 x3 count #1 miya 1 2 1 3 #2 zhask 0 1 NA 2 #3 balmond 3 0 2 3 #4 layla 0 1 NA 2 #5 angela 0 3 NA 2 </code></pre> Essentially this is: <ol> <li>selecting the variables/columns and storing in <code>vars</code> </li> <li>flagging the <code>NA</code> cells within those variables with <code>is.na(df[vars])</code> </li> <li> <code>col(df[vars])</code> returns a column number for every cell, which can be checked if it is less than the <code>df$count</code> in each corresponding row</li> <li>the values meeting both the above criteria are overwritten <code><-</code> with <code>0</code> </li> </ol>

How to change NA into 0 based on other variable / how many times it was recorded

Tags:

r

I am still new to R and need help. I want to change the NA value in variables x1,x2,x3 to 0 based on the value of count. Count specifies the number of observations, and the x1,x2,x3 stand for the visit to the site (or replication). The value in each 'X' variable is the number of species found. However, not all sites were visited 3 times. The variable count is telling us how many times the site was actually visited. I want to identify the actual NA and real 0 (which means no species found). I want to change the NA into 0 if the site is actually visited and keep it NA if the site is not visited. For example from the dummy data, 'zhask' site is visited 2 times, then the NA in x1 of zhask needs to be replaced with 0.

This is the dummy data:

     site x1 x2 x3 count
1    miya  1  2  1     3
2   zhask NA  1 NA     2
3 balmond  3 NA  2     3
4   layla NA  1 NA     2
5  angela NA  3 NA     2

So, it the table need to be changed into:

     site x1 x2 x3 count
1    miya  1  2  1     3
2   zhask  0  1 NA     2
3 balmond  3  0  2     3
4   layla  0  1 NA     2
5  angela  0  3 NA     2

I've tried many things and try to make my own function, however, it is not working:

for(i in 1:nrow(df))
  {
  if( is.na(df$x1[i]) && (i < df$count[i]))
  {df$x1[i]=0} 
  else 
  {df$x1[i]=df$x1[i]}
}

this is the script for the dummy dataframe:

x1= c(1,NA,3, NA, NA)
x2= c(2,1, NA, 1, 3)
x3 = c(1, NA, 2, NA, NA)
count=c(3,2,3,2,2)
site=c("miya", "zhask", "balmond", "layla", "angela")
df=data.frame(site,x1,x2,x3,count)

Any help will be very much appreciated!

611

asked Aug 04 '21 21:08

Trifosa Iin Simamora

Video Answer

3 Answers

One way to be to apply a function over all of your count columns. Here's a way to do that.

cols <- c("x1", "x2", "x3")
df[, cols] <- mapply(function(col, idx, count) {
  ifelse(idx <=count & is.na(col), 0, col)
}, df[,cols], seq_along(cols), MoreArgs=list(count=df$count))

#      site x1 x2 x3 count
# 1    miya  1  2  1     3
# 2   zhask  0  1 NA     2
# 3 balmond  3  0  2     3
# 4   layla  0  1 NA     2
# 5  angela  0  3 NA     2

We use mapply to iterate over the columns and the index of the column. We also pass in the count value each time (since it's the same for all columns, it goes in the MoreArgs= parameter). This mapply will return a list and we can use that to replace the columns with the updated values.

If you wanted to use dplyr, that might look more like

library(dplyr)
cols <- c("x1"=1, "x2"=2, "x3"=3)
df %>% 
  mutate(across(starts_with("x"), ~if_else(cols[cur_column()]<count & is.na(.x), 0, .x)))

I used the cols vector to get the index of the column which doesn't seem to be otherwise available when using across().

But a more "tidy" way to tackle this problem would be to pivot your data first to a "tidy" format. Then you can clean the data more easily and pivot back if necessary

library(dplyr)
library(tidyr)
df %>% 
  pivot_longer(cols=starts_with("x")) %>% 
  mutate(index=readr::parse_number(name)) %>%
mutate(value=if_else(index < count & is.na(value), 0, value)) %>% 
  select(-index) %>% 
  pivot_wider(names_from=name, values_from=value)

#   site    count    x1    x2    x3
#   <chr>   <dbl> <dbl> <dbl> <dbl>
# 1 miya        3     1     2     1
# 2 zhask       2     0     1    NA
# 3 balmond     3     3     0     2
# 4 layla       2     0     1    NA
# 5 angela      2     0     3    NA

answered Oct 21 '22 15:10

MrFlick

Via some indexing of the columns:

vars <- c("x1","x2","x3")
df[vars][is.na(df[vars]) & (col(df[vars]) <= df$count)] <- 0

#     site x1 x2 x3 count
#1    miya  1  2  1     3
#2   zhask  0  1 NA     2
#3 balmond  3  0  2     3
#4   layla  0  1 NA     2
#5  angela  0  3 NA     2

Essentially this is:

selecting the variables/columns and storing in vars
flagging the NA cells within those variables with is.na(df[vars])
col(df[vars]) returns a column number for every cell, which can be checked if it is less than the df$count in each corresponding row
the values meeting both the above criteria are overwritten <- with 0

answered Oct 21 '22 13:10

thelatemail

This could be yet another solution using purrr::pmap:

purrr::pmap is used for row-wise operations when applied on a data frame. It enables us to iterate over multiple arguments at the same time. So here c(...) refers to all corresponding elements of the selected variable (all except site) in each row

I think the rest of the solution is pretty clear but please let me know if I need to explain more about this.

library(dplyr)
library(purrr)
library(tidyr)

df %>%
  mutate(output = pmap(df[-1], ~ {x <- head(c(...), -1)
  inds <- which(is.na(x))
  req <- tail(c(...), 1) - sum(!is.na(x))
  x[inds[seq_len(req)]] <- 0
  x})) %>%
  select(site, output, count) %>%
  unnest_wider(output)

# A tibble: 5 x 5
  site       x1    x2    x3 count
  <chr>   <dbl> <dbl> <dbl> <dbl>
1 miya        1     2     1     3
2 zhask       0     1    NA     2
3 balmond     3     0     2     3
4 layla       0     1    NA     2
5 angela      0     3    NA     2

answered Oct 21 '22 15:10

Anoushiravan R

Related questions
                            
                                How to group by a fixed number of rows in dplyr? [duplicate]
                            
                                Why is the Rcpp implementation in my example much slower than the R function?
                            
                                How to vectorize a subsetting function in R?
                            
                                Using facet tags and strip labels together in ggplot2
                            
                                Problem with import raster package: Unable to load module "spmod"
                            
                                Converting all data.frames in environment to data.tables
                            
                                R - cannot find -llapack & cannot find -lblas
                            
                                Creating a correlation matrix from a data frame in R
                            
                                Using lapply over a list and adding a column with data frame name
                            
                                Count occurences of lists efficiently
                            
                                How to subtract two comma separated columns in R?
                            
                                Non-linear optimisation/programming with integer variables in R
                            
                                How to use submenu in rmarkdown navbar?
                            
                                R ggplot2 - legend at the bottom gets cut, how to find optimal number of columns for the legend on the fly?
                            
                                Different behavior of base R gsub and stringr::str_replace_all?
                            
                                Why does Rccp return a list-like output when I was expecting a dataframe output in R?
                            
                                R: Count frequency of values in nested list with sub-elements
                            
                                Troubleshooting 'Tool(s) not installed or not in PATH: ghostcript' warning in RStudio
                            
                                changing column names of a data frame by changing values - R
                            
                                function to track the changes in a field

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With