I have a vector of strings:
ve <- c("N","A","A","A","N","ANN","NA","NFNFNAA","23","N","A","NN", "parnot", "important", "notall")
I want to keep only three possible values in this vector: N
, A
, and NA
.
Therefore, I want to replace any element that is NOT N
or A
with NA
.
How can I achieve this?
I have tried the following:
gsub(ve, pattern = '[^NA]+', replacement = 'NA')
gsub(ve, pattern = '[^N|^A]+', replacement = 'NA')
But these don't work well, because they replace every instance of "A" or "N" in every string with NA. So in some cases I end up with NANANANANANA
, instead of simply NA
.
Use negative lookahead assertion.
ve <- c("N","A","A","A","N","ANN","NA","NFNFNAA","23","N","A","NN", "parnot", "important", "notall")
sub("^(?![NA]$).*", "NA", ve, perl=T)
# [1] "N" "A" "A" "A" "N" "NA" "NA" "NA" "NA" "N" "A" "NA" "NA" "NA" "NA"
^(?![NA]$)
asserts that
-> after the start ^
there should be only one letter [NA]
either N
or A
which should be followed by line end $
.
.*
match all chars
So that above regex would match any string except the string is N
or A
If we are looking for fixed matches, then use %in%
with negation !
and assign it to 'NA'
ve[!ve %in% c("A", "N", "NA")] <- 'NA'
Note that in R
, missing value is unquoted NA
and not quoted. Hope it is a different category and would advise to change the category name to different name to avoid future confusions while parsing
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With