Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Regex / gsub : How to collapse spaces in a string

Tags:

regex

r

perl

gsub

I have a vector of sentences that were scanned from handwritten documents. In the process there were some spacing problems like this:

 The d og is br own.

I was curious if there was a way to generically take any pattern with '_x_' or space-character-space and collapse the second space like this:

The d og is br own.  --> The dog is br own.

I'm only worried about a single character between the spaces ('_x_' NOT '_xx_').

Any suggestions?

like image 943
screechOwl Avatar asked Jul 20 '12 02:07

screechOwl


People also ask

How do I remove spaces between strings in R?

gsub() function is used to remove the space by removing the space in the given string.

How do you stop space in regex?

You can easily trim unnecessary whitespace from the start and the end of a string or the lines in a text file by doing a regex search-and-replace. Search for ^[ \t]+ and replace with nothing to delete leading whitespace (spaces and tabs).

How do you exclude a space in a string?

strip()—Remove Leading and Trailing Spaces. The str. strip() method removes the leading and trailing whitespace from a string.

How do I remove leading spaces in R?

trimws() function in R Language is used to trim the leading white spaces. It shrinks an object by removing outermost rows and columns with the same values.


1 Answers

Maybe

> x<-"The d og is br own."
> gsub(" (.) "," \\1",x)
[1] "The dog is br own."

or

gsub(" ([[:alnum:]]) "," \\1",x)

(.) matches anything ([[:alnum:]]) matches alphanumeric characters only.

like image 131
shhhhimhuntingrabbits Avatar answered Sep 25 '22 00:09

shhhhimhuntingrabbits