There are a slew of answers out there on how to remove extra whitespace from between words, which is super simple. However, I'm finding that removing extra whitespace within words is much harder. As a reproducible example, let's say I have a vector of data that looks like this:
x <- c("L L C", "P O BOX 123456", "NEW YORK")
What I'd like to do is something like this:
y <- gsub("(\\w)(\\s)(\\w)(\\s)", "\\1\\3", x)
But that leaves me with this:
[1] "LLC" "POBOX 123456" "NEW YORK"
Almost perfect, but I'd really like to have that second value say "PO BOX 123456". Is there a better way to do this than what I'm doing?
You may try this,
> x <- c("L L C", "P O BOX 123456", "NEW YORK")
> gsub("(?<=\\b\\w)\\s(?=\\w\\b)", "", x,perl=T)
[1] "LLC" "PO BOX 123456" "NEW YORK"
It just removes the space which exists between two single word characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With