Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Negation of gsub | Replace everything except strings in a certain vector

I have a vector of strings:

ve <- c("N","A","A","A","N","ANN","NA","NFNFNAA","23","N","A","NN", "parnot", "important", "notall")

I want to keep only three possible values in this vector: N, A, and NA.

Therefore, I want to replace any element that is NOT N or A with NA.

How can I achieve this?

I have tried the following:

gsub(ve, pattern = '[^NA]+', replacement = 'NA')
gsub(ve, pattern = '[^N|^A]+', replacement = 'NA')

But these don't work well, because they replace every instance of "A" or "N" in every string with NA. So in some cases I end up with NANANANANANA, instead of simply NA.

like image 974
benett Avatar asked Mar 07 '23 16:03

benett


2 Answers

Use negative lookahead assertion.

ve <- c("N","A","A","A","N","ANN","NA","NFNFNAA","23","N","A","NN", "parnot", "important", "notall")
sub("^(?![NA]$).*", "NA", ve, perl=T)
# [1] "N"  "A"  "A"  "A"  "N"  "NA" "NA" "NA" "NA" "N"  "A"  "NA" "NA" "NA" "NA"

^(?![NA]$) asserts that

-> after the start ^ there should be only one letter [NA] either N or A which should be followed by line end $.

.* match all chars

So that above regex would match any string except the string is N or A

like image 146
Avinash Raj Avatar answered Mar 09 '23 04:03

Avinash Raj


If we are looking for fixed matches, then use %in% with negation ! and assign it to 'NA'

ve[!ve %in% c("A", "N", "NA")] <- 'NA'

Note that in R, missing value is unquoted NA and not quoted. Hope it is a different category and would advise to change the category name to different name to avoid future confusions while parsing

like image 20
akrun Avatar answered Mar 09 '23 04:03

akrun