Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to access a capturing group from regexp.ReplaceAllFunc?

Tags:

go

How can I access a capture group from inside ReplaceAllFunc()?

package main

import (
    "fmt"
    "regexp"
)

func main() {
    body := []byte("Visit this page: [PageName]")
    search := regexp.MustCompile("\\[([a-zA-Z]+)\\]")

    body = search.ReplaceAllFunc(body, func(s []byte) []byte {
        // How can I access the capture group here?
    })

    fmt.Println(string(body))
}

The goal is to replace [PageName] with <a href="/view/PageName">PageName</a>.

This is the last task under the "Other tasks" section at the bottom of the Writing Web Applications Go tutorial.

like image 744
Saulo Silva Avatar asked Jan 17 '15 15:01

Saulo Silva


2 Answers

I agree that having access to capture group while inside of your function would be ideal, I don't think it's possible with regexp.ReplaceAllFunc. Only thing that comes to my mind right now regard how to do this with that function is this:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    body := []byte("Visit this page: [PageName] [OtherPageName]")
    search := regexp.MustCompile("\\[[a-zA-Z]+\\]")
    body = search.ReplaceAllFunc(body, func(s []byte) []byte {
        m := string(s[1 : len(s)-1])
        return []byte("<a href=\"/view/" + m + "\">" + m + "</a>")
    })
    fmt.Println(string(body))
}

EDIT

There is one other way I know how to do what you want. First thing you need to know is that you can specify non capturing group using syntax (?:re) where re is your regular expression. This is not essential, but will reduce number of not interesting matches.

Next thing to know is regexp.FindAllSubmatcheIndex. It will return slice of slices, where each internal slice represents ranges of all submatches for given matching of regexp.

Having this two things, you can construct somewhat generic solution:

package main

import (
    "fmt"
    "regexp"
)

func ReplaceAllSubmatchFunc(re *regexp.Regexp, b []byte, f func(s []byte) []byte) []byte {
    idxs := re.FindAllSubmatchIndex(b, -1)
    if len(idxs) == 0 {
        return b
    }
    l := len(idxs)
    ret := append([]byte{}, b[:idxs[0][0]]...)
    for i, pair := range idxs {
        // replace internal submatch with result of user supplied function
        ret = append(ret, f(b[pair[2]:pair[3]])...)
        if i+1 < l {
            ret = append(ret, b[pair[1]:idxs[i+1][0]]...)
        }
    }
    ret = append(ret, b[idxs[len(idxs)-1][1]:]...)
    return ret
}

func main() {
    body := []byte("Visit this page: [PageName] [OtherPageName][XYZ]     [XY]")
    search := regexp.MustCompile("(?:\\[)([a-zA-Z]+)(?:\\])")

    body = ReplaceAllSubmatchFunc(search, body, func(s []byte) []byte {
        m := string(s)
        return []byte("<a href=\"/view/" + m + "\">" + m + "</a>")
    })

    fmt.Println(string(body))
}
like image 135
tumdum Avatar answered Nov 06 '22 19:11

tumdum


If you want to get group in ReplaceAllFunc, you can use ReplaceAllString to get the subgroup.

package main

import (
    "fmt"
    "regexp"
)

func main() {
    body := []byte("Visit this page: [PageName]")
    search := regexp.MustCompile("\\[([a-zA-Z]+)\\]")

    body = search.ReplaceAllFunc(body, func(s []byte) []byte {
        // How can I access the capture group here?
        group := search.ReplaceAllString(string(s), `$1`)

        fmt.Println(group)

        // handle group as you wish
        newGroup := "<a href='/view/" + group + "'>" + group + "</a>"
        return []byte(newGroup)
    })

    fmt.Println(string(body))
}

And when there are many groups, you are able to get each group by this way, then handle each group and return desirable value.

like image 3
iamdavidzeng Avatar answered Nov 06 '22 20:11

iamdavidzeng