Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to trim and replace a string

Tags:

string

regex

r

string<-c("       this is a string  ")

Is it possible to trim-off the white spaces on both the sides of the string (or just one side as required) and replace it with a desired character, such as this, in R? The number of white spaces differ on each side of the string and have to be retained on replacement.

"~~~~~~~this is a string~~"
like image 494
jackson Avatar asked Sep 03 '13 08:09

jackson


People also ask

How do you trim and split a string?

To split a string and trim the surrounding spaces: Call the split() method on the string. Call the map() method to iterate over the array.

How do I remove or replace characters in a string?

Using 'str.replace() , we can replace a specific character. If we want to remove that specific character, replace that character with an empty string. The str. replace() method will replace all occurrences of the specific character mentioned.

How do I trim a character in a string?

There are three string methods that strip characters from a string: Trim() removes characters from both sides of the string. TrimStart() strips characters from the start of the string. TrimEnd() cuts away characters from the end of the string.

What is difference between trim and replace?

Trim() and Replace() do not serve the same purpose. Trim() removes all whitespace characters from the beginning and end of the string. That means spaces , tabs , new lines , returns , and other assorted whitespace characters. Replace() only replaces the designated characters with the given replacement.


2 Answers

This seems like an inefficient way of doing it, but maybe you should be looking in the direction of gregexpr and regmatches instead of gsub:

x <- "    this is a string  "
pattern <- "^ +?\\b|\\b? +$"
startstop <- gsub(" ", "~", regmatches(x, gregexpr(pattern, x))[[1]])
text <- paste(regmatches(x, gregexpr(pattern, x), invert=TRUE)[[1]], collapse="")
paste0(startstop[1], text, startstop[2])
# [1] "~~~~this is a string~~"

And, for fun, as a function, and a "vectorized" function:

## The function
replaceEnds <- function(string) {
  pattern <- "^ +?\\b|\\b? +$"
  startstop <- gsub(" ", "~", regmatches(string, gregexpr(pattern, string))[[1]])
  text <- paste(regmatches(string, gregexpr(pattern, string), invert = TRUE)[[1]],
                collapse = "")
  paste0(startstop[1], text, startstop[2])
}

## use Vectorize here if you want to apply over a vector
vReplaceEnds <- Vectorize(replaceEnds)

Some sample data:

myStrings <- c("    Four at the start, 2 at the end  ", 
               "   three at the start, one at the end ")

vReplaceEnds(myStrings)
#        Four at the start, 2 at the end        three at the start, one at the end  
#  "~~~~Four at the start, 2 at the end~~" "~~~three at the start, one at the end~"
like image 136
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 22 '22 20:09

A5C1D2H2I1M1N2O1R2T1


Use gsub:

gsub(" ", "~", "    this is a string  ")
[1] "~~~~this~is~a~string~~"

This function uses regular expressions to replace (i.e. sub), all occurrences of a pattern inside a string.

In your case, you have to express the pattern in a special way:

gsub("(^ *)|( *$)", "~~~", "    this is a string  ")
[1] "~~~this is a string~~~"

The pattern means:

  • (^ *): Find one or more spaces at the start of the string
  • ( *$): Find one or more spaces at the end of the string
  • `|: The OR operator

Now you can use this approach to tackle your problem of replacing each space with a new character:

txt <- "    this is a string  "
foo <- function(x, new="~"){
  lead <- gsub("(^ *).*", "\\1", x)
  last <- gsub(".*?( *$)", "\\1", x)
  mid  <- gsub("(^ *)|( *$)", "", x)
  paste0(
    gsub(" ", new, lead),
    mid,
    gsub(" ", new, last)
  )
}

> foo("    this is a string  ")
[1] "~~~~this is a string~~"

> foo(" And another one        ")
[1] "~And another one~~~~~~~~"

For more, see ?gsub or ?regexp.

like image 36
Andrie Avatar answered Sep 26 '22 20:09

Andrie