I'm a very novice R programmer, and I'm trying to convert old SAS code to R. I need to replace values based on a condition, and if the condition is false, leave them alone. I've googled this and tried many of the solutions posted, but to no avail. The reason I'm doing this is to categorize the first instance of an event (in this case physicians writing a prescription). if the first month they wrote a prescription was May of last year, their beginning month(newwriter) is 5. If it was in June, then 6 etc. I'm working backwards from June of this year, and I want to update their beginning month(newwriter) if an earlier prescription is found. If no earlier prescription is found, I want to leave the number alone. This is the code I'm using:
newwriters$newwriter=ifelse(newwriters$MTRx_06_30_2017>0,18,NULL)
newwriters$newwriter=ifelse(newwriters$MTRx_05_31_2017>0,17,NULL)
newwriters$newwriter=ifelse(newwriters$MTRx_04_30_2017>0,16,NULL)
newwriters$newwriter=ifelse(newwriters$MTRx_03_31_2017>0,15,NULL)
newwriters$newwriter=ifelse(newwriters$MTRx_02_28_2017>0,14,NULL)
newwriters$newwriter=ifelse(newwriters$MTRx_01_31_2017>0,13,NULL)
newwriters$newwriter=ifelse(newwriters$MTRx_12_31_2016>0,12,NULL)
newwriters$newwriter=ifelse(newwriters$MTRx_11_30_2016>0,11,NULL)
newwriters$newwriter=ifelse(newwriters$MTRx_10_31_2016>0,10,NULL)
The problem is that it keeps changing higher values to 0 if it doesn't find a prescription in that month. I want it to just leave the values alone. I've tried all of the following as well with no success:
newwriters$newwriter=ifelse(newwriters$MTRx_06_30_2017>0,18,newwriters$newwriter)
newwriters$newwriter=ifelse(newwriters$MTRx_06_30_2017>0,18,newwriters[,16])
newwriters$newwriter=ifelse(newwriters$MTRx_06_30_2017>0,18,)
As I mentioned, I'm new to writing R code. I'm sure there's a better/faster/more efficient way of doing this, but I'm not sure what else to try. Thanks in advance for your help!
If you want to change a column (or vector) conditionally, and leave entries untouched where the condition is not satisfied, you could probably also do without ifelse
.
Consider the following two vectors:
a = c(1,2,3,4,5)
b = c(1,1,1,1,1)
Now, let's say we want to replace values in b
with 2
, if the value in a
is larger than 3
. Here are two ways to achieve what you want:
b[a>2] = 2
b = ifelse(a>3,2,b)
They will both result in b
being 1 1 2 2 2
. However, now let's replace one of the values in a
, with NA
, let's say;
a = c(1,2,NA,4,5)
Now, compare the results of the following two snippets:
b = c(1,1,1,1,1)
b[a>2] = 2
# 1 1 1 2 2
and
b = c(1,1,1,1,1)
b = ifelse(a>3,2,b)
# 1 1 NA 2 2
The intuitive reason for this is that NA>3
returns not TRUE
or FALSE
, but NA
, so ifelse
does not know which of the two fields to return. When doing b[a>2]
, we only replace values where a>2
is TRUE
, and since NA
is not TRUE
, the value for the third entry is simply not altered.
So in your specific case,
writers$newwriter=ifelse(newwriters$MTRx_06_30_2017>0,18,newwriters$newwriter)
probably does not work as expected because there are there NULL or NA values in those columns. If you want to use ifelse
, you could do something like:
writers$newwriter=ifelse(newwriters$MTRx_06_30_2017>0 & !is.na(newwriters$MTRx_06_30_2017),18,newwriters$newwriter)
but you might also consider doing
writers$newwriter[newwriters$MTRx_06_30_2017>0] = 18
Hope this helps!
Better is to use if_else
from package dplyr. It has an explicit treatment for NA
s which make it more robust and it is also slightly faster.
Quick example:
> library(tidyverse)
> iris2 = iris %>% as_data_frame()
>
> #add some NA's
> iris2$Sepal.Length[c(1, 5, 8)] = NA
>
> #print
> iris2
# A tibble: 150 x 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 NA 3.50 1.40 0.200 setosa
2 4.90 3.00 1.40 0.200 setosa
3 4.70 3.20 1.30 0.200 setosa
4 4.60 3.10 1.50 0.200 setosa
5 NA 3.60 1.40 0.200 setosa
6 5.40 3.90 1.70 0.400 setosa
7 4.60 3.40 1.40 0.300 setosa
8 NA 3.40 1.50 0.200 setosa
9 4.40 2.90 1.40 0.200 setosa
10 4.90 3.10 1.50 0.100 setosa
# ... with 140 more rows
>
> #conditionally change
> iris2$new_var = if_else(iris2$Sepal.Length > 5, true = 100, false = 0, missing = -100)
>
> iris2$new_var
[1] -100 0 0 0 -100 100 0 -100 0 0 100 0 0 0 100 100 100 100 100 100 100 100 0 100 0 0 0
[28] 100 100 0 0 100 100 100 0 0 100 0 0 100 0 0 0 0 100 0 100 0 100 0 100 100 100 100
[55] 100 100 100 0 100 100 0 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
[82] 100 100 100 100 100 100 100 100 100 100 100 100 0 100 100 100 100 100 100 100 100 100 100 100 100 0 100
[109] 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
[136] 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
So, we made a new variable where values above 5 changed to 100, below 5 to 0, and NA
into -100.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With