Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Case-insensitive Lua pattern-matching

I'm writing a grep utility in Lua for our mobile devices running Windows CE 6/7, but I've run into some issues implementing case-insensitive match patterns. The obvious solution of converting everything to uppercase (or lower) does not work so simply due to the character classes.

The only other thing I can think of is converting the literals in the pattern itself to uppercase.

Here's what I have so far:

function toUpperPattern(instr)
    -- Check first character
    if string.find(instr, "^%l") then
        instr = string.upper(string.sub(instr, 1, 1)) .. string.sub(instr, 2)
    end
    -- Check the rest of the pattern
    while 1 do
        local a, b, str = string.find(instr, "[^%%](%l+)")
        if not a then break end
        if str then
            instr = string.sub(instr, 1, a) .. string.upper(string.sub(instr, a+1, b)) .. string.sub(instr, b + 1)
        end
    end
    return instr
end

I hate to admit how long it took to get even that far, and I can still see right away there are going to be problems with things like escaped percent signs '%%'

I figured this must be a fairly common issue, but I can't seem to find much on the topic. Are there any easier (or at least complete) ways to do this? I'm starting to go crazy here... Hoping you Lua gurus out there can enlighten me!

like image 571
Nubbychadnezzar Avatar asked Jul 09 '12 19:07

Nubbychadnezzar


People also ask

What is case insensitive matching?

Case insensitive matching is most frequently used in combination with the Windows file systems, which can store filenames using upper and lowercase letters, but do not distinguish between upper and lowercase characters when matching filenames on disk.

Is Lua case sensitive?

Lua is case-sensitive: and is a reserved word, but And and AND are two other different identifiers.

Are regex matches case sensitive?

In Java, by default, the regular expression (regex) matching is case sensitive.

Does Lua have regex?

They all are based on patterns. Unlike several other scripting languages, Lua does not use POSIX regular expressions (regexp) for pattern matching. The main reason for this is size: A typical implementation of POSIX regexp takes more than 4,000 lines of code. This is bigger than all Lua standard libraries together.


1 Answers

Try something like this:

function case_insensitive_pattern(pattern)

  -- find an optional '%' (group 1) followed by any character (group 2)
  local p = pattern:gsub("(%%?)(.)", function(percent, letter)

    if percent ~= "" or not letter:match("%a") then
      -- if the '%' matched, or `letter` is not a letter, return "as is"
      return percent .. letter
    else
      -- else, return a case-insensitive character class of the matched letter
      return string.format("[%s%s]", letter:lower(), letter:upper())
    end

  end)

  return p
end

print(case_insensitive_pattern("xyz = %d+ or %% end"))

which prints:

[xX][yY][zZ] = %d+ [oO][rR] %% [eE][nN][dD]
like image 102
Bart Kiers Avatar answered Sep 19 '22 05:09

Bart Kiers