Just to help someone who's just voluntarily removed their question, following a request for code he tried and other comments. Let's assume they tried something like this:
str <- "How do I best try and try and try and find a way to to improve this code?"
d <- unlist(strsplit(str, split=" "))
paste(d[-which(duplicated(d))], collapse = ' ')
and wanted to learn a better way. So what is the best way to remove a duplicate word from the string?
If you are still interested in alternate solutions you can use unique
which slightly simplifies your code.
paste(unique(d), collapse = ' ')
As per the comment by Thomas, you probably do want to remove punctuation. R's gsub
has some nice internal patterns you can use instead of strict regex. Of course you can always specify specific instances if you want to do some more refined regex.
d <- gsub("[[:punct:]]", "", d)
To remove duplicate words except for any special characters. use this function
rem_dup_word <- function(x){
x <- tolower(x)
paste(unique(trimws(unlist(strsplit(x,split=" ",fixed=F,perl=T)))),collapse =
" ")
}
Input data:
duptest <- "Samsung WA80E5LEC samsung Top Loading with Diamond Drum, 6 kg
(Silver)"
rem_dup_word(duptest)
output: samsung wa80e5lec top loading with diamond drum 6 kg (silver)
It will treat "Samsung" and "SAMSUNG" as duplicate
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With