Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Various Regexp options

Ruby's regex literal can take the options i, m, x, which are documented. But besides them, it can accept much wider variety of options. Here is the inventory of the options that seem to be allowed:

//e # => //
//i # => //i  ignore case
//m # => //m  multiline
//n # => //n
//o # => //
//s # => //
//u # => //
//x # => //x  extended
  • What do they do? Are some of them related to encoding? What about others?
  • If they indicate encoding, then what happens when more than one encoding is specified?
  • While other options raise an unknown regex options error, the ones listed here do not. If the answer to the previous question is that they do nothing, then why are these particular options allowed?
  • Why is n reflected in the inspection while others are not? Do the ones whose inspection do not show difference, actually differ?

If there is a documentation, link to that would be appreciated.

like image 907
sawa Avatar asked Feb 14 '23 03:02

sawa


1 Answers

Regular-expression modifiers:

Regular expression literals may include an optional modifier to control various aspects of matching. The modifier is specified after the second slash character, as shown previously and may be represented by one of these characters:

Modifier    Description
i           Ignore case when matching text.
o           Perform #{} interpolations only once, the first time the regexp literal is evaluated.
x           Ignores whitespace and allows comments in regular expressions
m           Matches multiple lines, recognizing newlines as normal characters
u,e,s,n     Interpret the regexp as Unicode (UTF-8), EUC, SJIS, or ASCII. 
            If none of these modifiers is specified, the regular expression is 
            assumed to use the source encoding.

source

Note: that description above has proviso. See sawa's answer for that.

like image 193
guido Avatar answered Feb 24 '23 01:02

guido