Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove redundant spaces/whitespace from a string in Golang?

Tags:

I was wondering how to remove:

  • All leading/trailing whitespace or new-line characters, null characters, etc.
  • Any redundant spaces within a string (ex. "hello[space][space]world" would be converted to "hello[space]world")

Is this possible with a single Regex, with unicode support for international space characters, etc.?

like image 792
Ben Guild Avatar asked May 18 '16 05:05

Ben Guild


People also ask

How do I remove white spaces from a string in Golang?

The TrimSpace() method from the strings package allows you to remove leading and trailing white-space characters from a specified string. The function removes the following white-space characters as defined by Unicode: \t – tab character.

How do you remove duplicate spaces in a string?

Using String. To remove duplicate whitespaces from a string, you can use the regular expression \s+ which matches with one or more whitespace characters, and replace it with a single space ' ' .

How do you trim strings in Go lang?

To trim spaces around string in Go language, call TrimSpace function of strings package, and pass the string as argument to it. TrimSpace function returns a String.

How do I remove spaces between words in a string?

In Java, we can use regex \\s+ to match whitespace characters, and replaceAll("\\s+", " ") to replace them with a single space.


2 Answers

You can get quite far just using the strings package as strings.Fields does most of the work for you:

package main  import (     "fmt"     "strings" )  func standardizeSpaces(s string) string {     return strings.Join(strings.Fields(s), " ") }  func main() {     tests := []string{" Hello,   World  ! ", "Hello,\tWorld ! ", " \t\n\t Hello,\tWorld\n!\n\t"}     for _, test := range tests {         fmt.Println(standardizeSpaces(test))     } } // "Hello, World !" // "Hello, World !" // "Hello, World !" 
like image 155
ifross Avatar answered Sep 20 '22 13:09

ifross


It seems that you might want to use both \s shorthand character class and \p{Zs} Unicode property to match Unicode spaces. However, both steps cannot be done with 1 regex replacement as you need two different replacements, and the ReplaceAllStringFunc only allows a whole match string as argument (I have no idea how to check which group matched).

Thus, I suggest using two regexps:

  • ^[\s\p{Zs}]+|[\s\p{Zs}]+$ - to match all leading/trailing whitespace
  • [\s\p{Zs}]{2,} - to match 2 or more whitespace symbols inside a string

Sample code:

package main  import (     "fmt"     "regexp" )  func main() {     input := "   Text   More here     "     re_leadclose_whtsp := regexp.MustCompile(`^[\s\p{Zs}]+|[\s\p{Zs}]+$`)     re_inside_whtsp := regexp.MustCompile(`[\s\p{Zs}]{2,}`)     final := re_leadclose_whtsp.ReplaceAllString(input, "")     final = re_inside_whtsp.ReplaceAllString(final, " ")     fmt.Println(final) } 
like image 30
Wiktor Stribiżew Avatar answered Sep 18 '22 13:09

Wiktor Stribiżew