I have a vector of sentences that were scanned from handwritten documents. In the process there were some spacing problems like this:
The d og is br own.
I was curious if there was a way to generically take any pattern with '_x_'
or space-character-space and collapse the second space like this:
The d og is br own. --> The dog is br own.
I'm only worried about a single character between the spaces ('_x_'
NOT '_xx_'
).
Any suggestions?
gsub() function is used to remove the space by removing the space in the given string.
You can easily trim unnecessary whitespace from the start and the end of a string or the lines in a text file by doing a regex search-and-replace. Search for ^[ \t]+ and replace with nothing to delete leading whitespace (spaces and tabs).
strip()—Remove Leading and Trailing Spaces. The str. strip() method removes the leading and trailing whitespace from a string.
trimws() function in R Language is used to trim the leading white spaces. It shrinks an object by removing outermost rows and columns with the same values.
Maybe
> x<-"The d og is br own."
> gsub(" (.) "," \\1",x)
[1] "The dog is br own."
or
gsub(" ([[:alnum:]]) "," \\1",x)
(.)
matches anything ([[:alnum:]])
matches alphanumeric characters only.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With