Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Porting POSIX regex to Lua pattern - unexpected results

I have hard time porting POSIX regex to Lua string patterns.

I'm dealing with html response from which I would like to filter checkboxes that are checked. Particularly I'm interested in value and name fields of each checked checkbox:

Here are examples of checkboxes I'm interested in:

<input class="rid-2 form-checkbox" id="edit-2-access-comments" name="2[access comments]" value="access comments" checked="checked" type="checkbox">

<input class="rid-3 form-checkbox real-checkbox" id="edit-3-administer-comments" name="3[administer comments]" value="administer comments" checked="checked" type="checkbox">

as opposed I'm not interested in this (unchecked checkbox):

<input class="rid-2 form-checkbox" id="edit-2-access-printer-friendly-version" name="2[access printer-friendly version]" value="access printer-friendly version" type="checkbox">

Using POSIX regex I've used following pattern in Python: pattern=r'name="(.*)" value="(.*)" checked="checked"' and it just worked.

My first approach in Lua was simply to use this: pattern ='name="(.-)" value="(.-)" checked="checked"' but it gave strange results (first capture was as expected but the second one returned lots of unneeded html).

I've also tried following pattern: pattern = 'name="(%d?%[.-%])" value="(.-)"%s?(c?).-="?c.-"%s?type="checkbox"'

This time, in second capture content of value was returned but all checkboxes where matched (not only those with checked="checked" field)

For completeness, here's the Lua code (snippet from my Nmap NSE script) that attempts to do this pattern matching:

  pattern = 'name="(.-)" value="(.-)" checked="checked"' 
  data = {}
  for name, value in string.gmatch(res.body, pattern) do
    stdnse.debug(1, string.format("%s %s", name, value))
  end
like image 440
mzet Avatar asked Dec 11 '25 12:12

mzet


1 Answers

I've used following pattern in Python: pattern=r'name="(.*)" value="(.*)" checked="checked"' and it just worked.

Python re is not POSIX compliant and . matches any char but a newline char there (in POSIX and Lua, . matches any char including a newline).

If you want to match a string that has 3 attributes above one after another, you should use something like

local pattern = 'name="([^"]*)"%s+value="([^"]*)"%s+checked="checked"'

Why not [^\r\n]-? Because in case there are two tags on one line with the first having the first and/or second attribute and the second having the second and third or just second (and even if there is a third tag with the third attribute while the first one contains the first two attributes), there will be match, as [^\r\n] matches < and > and can "overfire" across the tags.

Note that [^"]*, a negated bracket expression, will only match 0+ chars other than " thus restricting the matches within one tag.

See Lua demo:

local rx = 'name="([^"]*)"%s+value="([^"]*)"%s+checked="checked"'
local s = '<li name="n1"\nvalue="v1"><li name="n2"\nvalue="v1" checked="checked"><li name="n3"\nvalue="v3"   checked="checked">'
for name, value in string.gmatch(s, rx) do
  print(name, value)
end

Output:

n2  v1
n3  v3
like image 164
Wiktor Stribiżew Avatar answered Dec 14 '25 11:12

Wiktor Stribiżew