Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Go: Removing accents from strings

Tags:

go

I'm new to Go and I'm trying to implement a function to convert accented characters into their non-accented equivalent. I'm attempting to follow the example given in this blog (see the heading 'Performing magic').

What I've attempted to gather from this is:

package main

import (
    "fmt"
    "unicode"
    "bytes"
    "code.google.com/p/go.text/transform"
    "code.google.com/p/go.text/unicode/norm"
)


func isMn (r rune) bool {
        return unicode.Is(unicode.Mn, r) // Mn: nonspacing marks
    }

func main() {
    r := bytes.NewBufferString("Your Śtring")
    t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
    r = transform.NewReader(r, t)
    fmt.Println(r)
}

It does not work in the slightest and I quite honestly don't know what it means anyway. Any ideas?

like image 942
Alasdair Avatar asked Jul 05 '14 16:07

Alasdair


2 Answers

Note that Go 1.5 (August 2015) or Go 1.6 (Q1 2016) could introduce a new runes package, with transform operations.

That includes (in runes/example_test.go) a runes.Remove function, which will help transform résumé into resume:

func ExampleRemove() {
    t := transform.Chain(norm.NFD, runes.Remove(runes.In(unicode.Mn)), norm.NFC)
    s, _, _ := transform.String(t, "résumé")
    fmt.Println(s)

    // Output:
    // resume
}

This is still being reviewed though (April 2015).

like image 160
VonC Avatar answered Nov 06 '22 22:11

VonC


r should be or type io.Reader, and you can't print r like that. First, you need to read the content to a byte slice:

 var (   
         s = "Your Śtring"
         b = make([]byte, len(s))

         r io.Reader = strings.NewReader(s)
 ) 
 t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
 r = transform.NewReader(r, t)
 r.Read(b)
 fmt.Println(string(b))

This works, but for some reason it returns me "Your Stri", two bytes less than needed.

This here is the version which actually does what you need, but I'm still not sure why the example from the blog works so strangely.

s := "Yoùr Śtring"
b := make([]byte, len(s))

t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
_, _, e := t.Transform(b, []byte(s), true)
if e != nil { panic(e) }

fmt.Println(string(b))
like image 34
Ainar-G Avatar answered Nov 06 '22 23:11

Ainar-G