Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove extra white space from between letters in R using gsub()

Tags:

regex

r

gsub

There are a slew of answers out there on how to remove extra whitespace from between words, which is super simple. However, I'm finding that removing extra whitespace within words is much harder. As a reproducible example, let's say I have a vector of data that looks like this:

x <- c("L L C", "P O BOX 123456", "NEW YORK")

What I'd like to do is something like this:

y <- gsub("(\\w)(\\s)(\\w)(\\s)", "\\1\\3", x)

But that leaves me with this:

[1] "LLC" "POBOX 123456" "NEW YORK"

Almost perfect, but I'd really like to have that second value say "PO BOX 123456". Is there a better way to do this than what I'm doing?

like image 303
tblznbits Avatar asked Feb 10 '23 15:02

tblznbits


1 Answers

You may try this,

> x <- c("L L C", "P O BOX 123456", "NEW YORK")
> gsub("(?<=\\b\\w)\\s(?=\\w\\b)", "", x,perl=T)
[1] "LLC"           "PO BOX 123456" "NEW YORK" 

It just removes the space which exists between two single word characters.

like image 91
Avinash Raj Avatar answered Feb 12 '23 03:02

Avinash Raj