Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sanitizing User Regexp

I want to write a function that allows users to match data based on a regexp, but I am concerned about sanitation of the user strings. I know with SQL queries you can use bind variables to avoid SQL injection attacks, but I am not sure if there's such a mechanism for regexps. I see that there's Regexp.escape, but I want to allow valid regexps.

Here is is the sample function:

  def tagged?(text)
    tags.each do |tag|
      return true if text =~ /#{tag.name}/i
    end
    return false
  end

Since I am just matching directly on tag.name is there a chance that someone could insert a Proc call or something to break out of the regexp and cause havoc?

Any advice on best practice would be appreciated.

like image 806
Dan McNevin Avatar asked Dec 31 '09 15:12

Dan McNevin


2 Answers

Interpolated strings in a Regexp are not executed, but do generate annoying warnings:

/#{exit -3}/.match('test')
# => exits

foo = '#{exit -3}'
/#{foo}/.match('test')
# => warning: regexp has invalid interval
# => warning: regexp has `}' without escape

The two warnings seem to pertain to the opening #{ and the closing } respectively, and are independent.

As a strategy that's more efficient, you might want to sanitize the list of tags into a combined regexp you can run once. It is generally far less efficient to construct and test against N regular expressions than 1 with N parts.

Perhaps something along the lines of this:

class Taggable
  def tags
    @tags
  end

  def tags=(value)
    @tags = value

    @tag_regexp = Regexp.new(
      [
        '^(?:',
        @tags.collect do |tag|
          '(?:' + tag.sub(/\#\{/, '\\#\\{').sub(/([^\\])\}/, '\1\\}') + ')'
        end.join('|'),
        ')$'
      ].to_s,
      Regexp::IGNORECASE
    )
  end

  def tagged?(text)
    !!text.match(@tag_regexp)
  end
end

This can be used like this:

e = Taggable.new
e.tags = %w[ #{exit-3} .*\.gif .*\.png .*\.jpe?g ]

puts e.tagged?('foo.gif').inspect

If the exit call was executed, the program would halt there, but it just interprets that as a literal string. To avoid warnings it is escaped with backslashes.

like image 92
tadman Avatar answered Sep 28 '22 22:09

tadman


You should probably create an instance of the Regexp class instead.

def tagged?(text)
  return tags.any? { |tag| text =~ Regexp.new(tag.name, Regexp::IGNORECASE) }
end
like image 37
Ciarán Walsh Avatar answered Sep 28 '22 22:09

Ciarán Walsh