Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Golang Regexp Named Groups and Submatches

Tags:

regex

go

I am trying to match a regular expression and get the capturing group name for the match. This works when the regular expression only matches the string once, but if it matches the string more than once, SubexpNames doesn't return the duplicated names.

Here's an example:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile("(?P<first>[a-zA-Z]+) ")
    fmt.Printf("%q\n", re.SubexpNames())
    fmt.Printf("%q\n", re.FindAllStringSubmatch("Alan Turing ", -1))
}

The output is:

["" "first"]
[["Alan " "Alan"] ["Turing " "Turing"]]

Is it possible to get the capturing group name for each submatch?

like image 232
Randy Layman Avatar asked Jan 22 '16 14:01

Randy Layman


2 Answers

That might be included in Go 1.14 (Q1 2020, not yet confirmed).
See "proposal: regexp: add (*Regexp).SubexpIndex #32420". Update: it has been included with commit 782fcb4 in Go 1.15 (August 2020).

// SubexpIndex returns the index of the first subexpression with the given name,
// or else -1 if there is no subexpression with that name.
//
// Note that multiple subexpressions can be written using the same name, as in
// (?P<bob>a+)(?P<bob>b+), which declares two subexpressions named "bob".
// In this case SubexpIndex returns the index of the leftmost such subexpression
// in the regular expression.
func (*Regexp) SubexpIndex(name string) int

This is discussed in CL 187919.

re := regexp.MustCompile(`(?P<first>[a-zA-Z]+) (?P<last>[a-zA-Z]+)`)
fmt.Println(re.MatchString("Alan Turing"))
matches := re.FindStringSubmatch("Alan Turing")
lastIndex := re.SubexpIndex("last")
fmt.Printf("last => %d\n", lastIndex)
fmt.Println(matches[lastIndex])

// Output:
// true
// last => 2
// Turing
like image 108
VonC Avatar answered Sep 18 '22 17:09

VonC


Group names and positions are fixed:

re := regexp.MustCompile("(?P<first>[a-zA-Z]+) ")
groupNames := re.SubexpNames()
for matchNum, match := range re.FindAllStringSubmatch("Alan Turing ", -1) {
    for groupIdx, group := range match {
        name := groupNames[groupIdx]
        if name == "" {
            name = "*"
        }
        fmt.Printf("#%d text: '%s', group: '%s'\n", matchNum, group, name)
    }
}
like image 24
alex vasi Avatar answered Sep 19 '22 17:09

alex vasi