Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ifelse do nothing in R

I'm a very novice R programmer, and I'm trying to convert old SAS code to R. I need to replace values based on a condition, and if the condition is false, leave them alone. I've googled this and tried many of the solutions posted, but to no avail. The reason I'm doing this is to categorize the first instance of an event (in this case physicians writing a prescription). if the first month they wrote a prescription was May of last year, their beginning month(newwriter) is 5. If it was in June, then 6 etc. I'm working backwards from June of this year, and I want to update their beginning month(newwriter) if an earlier prescription is found. If no earlier prescription is found, I want to leave the number alone. This is the code I'm using:

newwriters$newwriter=ifelse(newwriters$MTRx_06_30_2017>0,18,NULL)
newwriters$newwriter=ifelse(newwriters$MTRx_05_31_2017>0,17,NULL)
newwriters$newwriter=ifelse(newwriters$MTRx_04_30_2017>0,16,NULL)
newwriters$newwriter=ifelse(newwriters$MTRx_03_31_2017>0,15,NULL)
newwriters$newwriter=ifelse(newwriters$MTRx_02_28_2017>0,14,NULL)
newwriters$newwriter=ifelse(newwriters$MTRx_01_31_2017>0,13,NULL)
newwriters$newwriter=ifelse(newwriters$MTRx_12_31_2016>0,12,NULL)
newwriters$newwriter=ifelse(newwriters$MTRx_11_30_2016>0,11,NULL)
newwriters$newwriter=ifelse(newwriters$MTRx_10_31_2016>0,10,NULL)

The problem is that it keeps changing higher values to 0 if it doesn't find a prescription in that month. I want it to just leave the values alone. I've tried all of the following as well with no success:

newwriters$newwriter=ifelse(newwriters$MTRx_06_30_2017>0,18,newwriters$newwriter)
newwriters$newwriter=ifelse(newwriters$MTRx_06_30_2017>0,18,newwriters[,16])
newwriters$newwriter=ifelse(newwriters$MTRx_06_30_2017>0,18,)

As I mentioned, I'm new to writing R code. I'm sure there's a better/faster/more efficient way of doing this, but I'm not sure what else to try. Thanks in advance for your help!

like image 272
Kevin.C Avatar asked Jul 17 '17 18:07

Kevin.C


Video Answer


2 Answers

If you want to change a column (or vector) conditionally, and leave entries untouched where the condition is not satisfied, you could probably also do without ifelse.

Consider the following two vectors:

a = c(1,2,3,4,5)
b = c(1,1,1,1,1)

Now, let's say we want to replace values in b with 2, if the value in a is larger than 3. Here are two ways to achieve what you want:

b[a>2] = 2
b = ifelse(a>3,2,b)

They will both result in b being 1 1 2 2 2. However, now let's replace one of the values in a, with NA, let's say;

a = c(1,2,NA,4,5)

Now, compare the results of the following two snippets:

b = c(1,1,1,1,1)
b[a>2] = 2
# 1 1 1 2 2

and

b = c(1,1,1,1,1)
b = ifelse(a>3,2,b)
# 1  1 NA  2  2

The intuitive reason for this is that NA>3returns not TRUE or FALSE, but NA, so ifelse does not know which of the two fields to return. When doing b[a>2], we only replace values where a>2 is TRUE, and since NA is not TRUE, the value for the third entry is simply not altered.


So in your specific case,

writers$newwriter=ifelse(newwriters$MTRx_06_30_2017>0,18,newwriters$newwriter)

probably does not work as expected because there are there NULL or NA values in those columns. If you want to use ifelse, you could do something like:

writers$newwriter=ifelse(newwriters$MTRx_06_30_2017>0 & !is.na(newwriters$MTRx_06_30_2017),18,newwriters$newwriter)

but you might also consider doing

writers$newwriter[newwriters$MTRx_06_30_2017>0] = 18

Hope this helps!

like image 187
Florian Avatar answered Oct 24 '22 05:10

Florian


Better is to use if_else from package dplyr. It has an explicit treatment for NAs which make it more robust and it is also slightly faster.

Quick example:

> library(tidyverse)
> iris2 = iris %>% as_data_frame()
> 
> #add some NA's
> iris2$Sepal.Length[c(1, 5, 8)] = NA
> 
> #print
> iris2
# A tibble: 150 x 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 1        NA           3.50         1.40       0.200 setosa 
 2         4.90        3.00         1.40       0.200 setosa 
 3         4.70        3.20         1.30       0.200 setosa 
 4         4.60        3.10         1.50       0.200 setosa 
 5        NA           3.60         1.40       0.200 setosa 
 6         5.40        3.90         1.70       0.400 setosa 
 7         4.60        3.40         1.40       0.300 setosa 
 8        NA           3.40         1.50       0.200 setosa 
 9         4.40        2.90         1.40       0.200 setosa 
10         4.90        3.10         1.50       0.100 setosa 
# ... with 140 more rows
> 
> #conditionally change
> iris2$new_var = if_else(iris2$Sepal.Length > 5, true = 100, false = 0, missing = -100)
> 
> iris2$new_var
  [1] -100    0    0    0 -100  100    0 -100    0    0  100    0    0    0  100  100  100  100  100  100  100  100    0  100    0    0    0
 [28]  100  100    0    0  100  100  100    0    0  100    0    0  100    0    0    0    0  100    0  100    0  100    0  100  100  100  100
 [55]  100  100  100    0  100  100    0  100  100  100  100  100  100  100  100  100  100  100  100  100  100  100  100  100  100  100  100
 [82]  100  100  100  100  100  100  100  100  100  100  100  100    0  100  100  100  100  100  100  100  100  100  100  100  100    0  100
[109]  100  100  100  100  100  100  100  100  100  100  100  100  100  100  100  100  100  100  100  100  100  100  100  100  100  100  100
[136]  100  100  100  100  100  100  100  100  100  100  100  100  100  100  100

So, we made a new variable where values above 5 changed to 100, below 5 to 0, and NA into -100.

like image 43
CoderGuy123 Avatar answered Oct 24 '22 06:10

CoderGuy123