Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

go idiom for writing long regular expressions, embedded comments?

Tags:

regex

go

Some languages have facilities for embedding newlines and whitespace in long regular expressions to make them more readable

( yogi | booboo )   # match something
\s
( the \s)?          # optional article
bear                # bears are not Mr. Ranger

AFAICT golang does not have that option, is that right?

Lacking that, is a composed regex the only option for clarity? Or is there another idiom? I'm not finding any examples of long regexen in go right now.

like image 597
Kevin G. Avatar asked Jul 05 '14 16:07

Kevin G.


People also ask

What is regular expression used for?

Regular expressions are particularly useful for defining filters. Regular expressions contain a series of characters that define a pattern of text to be matched—to make a filter more specialized, or general.

What will the regular expression match?

By default, regular expressions will match any part of a string. It's often useful to anchor the regular expression so that it matches from the start or end of the string: ^ matches the start of string. $ matches the end of the string.

Why do you need regular expressions in natural language processing?

Regular Expressions RE helps us to match or find other strings or sets of strings, using a specialized syntax held in a pattern. Regular expressions are used to search texts in UNIX as well as in MS WORD in identical way. We have various search engines using a number of RE features.


1 Answers

Most of the time people just provide a comment with a description of what the regexp matches. But skimming through the Go source code I have found this interesting example:

// removeRE is the list of patterns to skip over at the beginning of a
// message when looking for message text.
var removeRE = regexp.MustCompile(`(?m-s)\A(` +
    // Skip leading "Hello so-and-so," generated by codereview plugin.
    `(Hello(.|\n)*?\n\n)` +

    // Skip quoted text.
    `|((On.*|.* writes|.* wrote):\n)` +
    `|((>.*\n)+)` +

    // Skip lines with no letters.
    `|(([^A-Za-z]*\n)+)` +

    // Skip links to comments and file info.
    `|(http://codereview.*\n([^ ]+:[0-9]+:.*\n)?)` +
    `|(File .*:\n)` +

    `)`,
)
like image 97
Ainar-G Avatar answered Nov 15 '22 07:11

Ainar-G