Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Premature end of char class in interpolated regex

Tags:

regex

ruby

I can't seem to solve this issue, hope someone can help:

Nilfacs is an array of strings pulled from a hash.

For this line:

looping_finaltext = finaltext.reject {|sentence| nilfacs.any? {|fac| sentence =~ /#{(fac)}/i}}

I get the following errors: warning: character class has ']' without escape: /[[]]/ and block (2 levels) in <main>': premature end of char-class: /[[]]/i (RegexpError)

All of the strings are just normal words (like "condition") and do not contain characters that should need to be escaped.

Is this an indication that something unanticipated is being fed into the array as a string? Or is there something wrong with my syntax in this line?

like image 517
Thomas Avatar asked Dec 15 '22 23:12

Thomas


2 Answers

Is this an indication that something unanticipated is being fed into the array as a string?

Yes, that is it exactly. I expect that you have nested arrays and somewhere in there you have an array of an empty array [[]] whose to_s representation produces the result you found.

When you use interpolation in a regex literal the characters in your source are treated as they would be in regex. Just as /b[/ is not a valid regular expression, so foo="b["; bar=/#{foo}/ is not valid.

nilfacs = [ "a[]", "b[", "c]", [[]] ]

nilfacs.each do |fac|
  begin
    p /#{fac}/
  rescue RegexpError=>e
    puts e
  end
end

#=> empty char-class: /a[]/
#=> premature end of char-class: /b[/
#=> /c]/
#=> warning: regular expression has ']' without escape: /[[]]/
#=> premature end of char-class: /[[]]/

If you want to use your strings as literal characters, you want to use Regexp.escape:

nilfacs.each do |fac|
  p /#{Regexp.escape fac}/
end
#=> /a\[\]/
#=> /b\[/
#=> /c\]/

Alternatively, you may want to use Regexp.union to create a single regexp from your array that matches all the literal strings therein:

rejects = %w[dog cat]
re = Regexp.new(Regexp.union(rejects).source,'i') #=> /dog|cat/i
looping_finaltext = finaltext.reject{ |sentence| sentence=~re }
like image 181
Phrogz Avatar answered Dec 21 '22 23:12

Phrogz


Nilfacs is an array of strings pulled from a hash.

Probably not, nilfacs almost certainly has an empty AoA as a member. Try this in irb and you'll see:

>> a = [[]]
>> /#{a}/
(irb):4: warning: character class has ']' without escape: /[[]]/
RegexpError: premature end of char-class: /[[]]/

Either that or you have the string '[[]]' in nilfacs:

>> a = '[[]]'
>> /#{a}/
(irb):6: warning: character class has ']' without escape: /[[]]/
RegexpError: premature end of char-class: /[[]]/

Once you fix your nilfacs to be a the array of strings that you want it to be, you can clean up your code by using a single regex instead of any?:

re = Regexp.new(Regexp.union(nilfacs).source, Regexp::IGNORECASE)
looping_finaltext = finaltext.reject { |sentence| sentence =~ re }

The regex engine can check all the patterns at once to avoid the overhead of invoking the String#=~ over and over again inside the any? block.

like image 31
mu is too short Avatar answered Dec 22 '22 00:12

mu is too short