Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Golang complex fold grüßen

I'm trying to get case folding to be consistent between three languages (C++, Python and Golang) because I need to be able to check if a string matches the one saved no matter the language.

An example problematic word is the German word "grüßen" which in uppercase is "GRÜSSEN" (Note the 'ß' becomes two characters as 'SS').

  • C++ works well using boost::locale text conversion docs
  • Python 3 also works through str.casefold() casefold docs
  • However, Golang doesn't seem to have a way to do proper case folding. golang playground example

Is there some way to do this that I'm missing, or does this bug at the end of unicode's documentation apply to all usages of text conversion in golang? If so, what are my options for case folding other than writing it in cgo?

like image 548
Shawn Blakesley Avatar asked Mar 28 '17 02:03

Shawn Blakesley


1 Answers

Advanced (Unicode-enabled) text processing is not part of the Go stdlib,¹ and exists in the form of a host of ("blessed") third-party packages under the golang.org/x/text/ umbrella.

As Shawn figured out by himself, one can do

import (
  "golang.org/x/text/cases"
)

c := cases.Fold()
c.String("grüßen")

to get "grüssen" back.


¹ That's because whatever is shipped in the stdlib is subject to the Go 1 compatibility promise, and at the time Go 1 was shipped certain functionality wasn't available or was incomplete or its APIs were in flux etc, so such bits were kept out of the core to let them mature.

like image 123
kostix Avatar answered Sep 22 '22 20:09

kostix