I have hard time porting POSIX regex to Lua string patterns.
I'm dealing with html response from which I would like to filter checkboxes
that are checked. Particularly I'm interested in value and name fields of
each checked checkbox:
Here are examples of checkboxes I'm interested in:
<input class="rid-2 form-checkbox" id="edit-2-access-comments" name="2[access comments]" value="access comments" checked="checked" type="checkbox">
<input class="rid-3 form-checkbox real-checkbox" id="edit-3-administer-comments" name="3[administer comments]" value="administer comments" checked="checked" type="checkbox">
as opposed I'm not interested in this (unchecked checkbox):
<input class="rid-2 form-checkbox" id="edit-2-access-printer-friendly-version" name="2[access printer-friendly version]" value="access printer-friendly version" type="checkbox">
Using POSIX regex I've used following pattern in Python: pattern=r'name="(.*)" value="(.*)" checked="checked"' and it just worked.
My first approach in Lua was simply to use this: pattern ='name="(.-)"
value="(.-)" checked="checked"' but it gave strange results (first capture
was as expected but the second one returned lots of unneeded html).
I've also tried following pattern:
pattern = 'name="(%d?%[.-%])" value="(.-)"%s?(c?).-="?c.-"%s?type="checkbox"'
This time, in second capture content of value was returned but all
checkboxes where matched (not only those with checked="checked" field)
For completeness, here's the Lua code (snippet from my Nmap NSE script) that attempts to do this pattern matching:
pattern = 'name="(.-)" value="(.-)" checked="checked"'
data = {}
for name, value in string.gmatch(res.body, pattern) do
stdnse.debug(1, string.format("%s %s", name, value))
end
I've used following pattern in Python:
pattern=r'name="(.*)" value="(.*)" checked="checked"'and it just worked.
Python re is not POSIX compliant and . matches any char but a newline char there (in POSIX and Lua, . matches any char including a newline).
If you want to match a string that has 3 attributes above one after another, you should use something like
local pattern = 'name="([^"]*)"%s+value="([^"]*)"%s+checked="checked"'
Why not [^\r\n]-? Because in case there are two tags on one line with the first having the first and/or second attribute and the second having the second and third or just second (and even if there is a third tag with the third attribute while the first one contains the first two attributes), there will be match, as [^\r\n] matches < and > and can "overfire" across the tags.
Note that [^"]*, a negated bracket expression, will only match 0+ chars other than " thus restricting the matches within one tag.
See Lua demo:
local rx = 'name="([^"]*)"%s+value="([^"]*)"%s+checked="checked"'
local s = '<li name="n1"\nvalue="v1"><li name="n2"\nvalue="v1" checked="checked"><li name="n3"\nvalue="v3" checked="checked">'
for name, value in string.gmatch(s, rx) do
print(name, value)
end
Output:
n2 v1
n3 v3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With