Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Golang multiline regex not working

Tags:

regex

go

Why the the following multiline regex do not work, I expect to match the substring inside the tags. Other simples multiline matches worked correctly.

func main() {
    r := regexp.MustCompile(`(?m)<think>(.*)</think>`)
    const s = `That is 
    <think>
    FOOBAR
    </think>`
    fmt.Printf("%#v\n", r.FindStringSubmatch(s))
}

https://play.golang.org/p/8C6u_0ca8w

like image 399
Eduardo Pereira Avatar asked May 09 '16 02:05

Eduardo Pereira


2 Answers

By default, "." doesn't match newline. If you give the "s" flag, it does. I don't think you need "m".

Note that if there are multiple <think>...</think> in your string, the regexp will match everything between the first <think> and the last </think>. Using .*? will cause it to only match the contents of the first one.

like image 134
Andy Schweig Avatar answered Sep 20 '22 10:09

Andy Schweig


Do not use regexp to parse XML, instead use encoding/xml. Example of a corner-case which is impossible to handle in regexp: <think><elem attrib="I'm pondering about </think> tag now"></elem></think>

I'll use START and STOP as markers, just to disassociate from any XML stuff. Complete example (includes both LF and CRLF line endings, just in case) with a link to The Go Playground:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    r := regexp.MustCompile(`(?s)START(.*?)STOP`)
    const s = "That is \nSTART\nFOOBAR\r\n\r\nSTOP\n"
    fmt.Printf("%#v\n", r.FindStringSubmatch(s))
}

returns:

[]string{"START\nFOOBAR\r\n\r\nSTOP", "\nFOOBAR\r\n\r\n"}
like image 35
kubanczyk Avatar answered Sep 21 '22 10:09

kubanczyk