Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write this regular expression in Lua?

I'm new to the Lua regex equivalence features, I need to write the following regular expression, which should match numbers with decimals

\b[0-9]*.\b[0-9]*(?!])

Basically, it matches numbers in decimal format (eg: 1, 1.1, 0.1, 0.11), which do not end with ']', I've been trying to write a regex like this with Lua using string.gmatch, but I'm quite inexperienced with Lua matching expressions...

Thanks!

like image 798
Goles Avatar asked May 31 '11 18:05

Goles


People also ask

What is regex Lua?

The Lua Regular Expression or RegEx is a sequence of characters which forms a search pattern and that is used to match a combinations of characters in a strings. The RegEx can be used to verify whether a string contain the specified search pattern or not.

Does Lua use regex?

They all are based on patterns. Unlike several other scripting languages, Lua does not use POSIX regular expressions (regexp) for pattern matching. The main reason for this is size: A typical implementation of POSIX regexp takes more than 4,000 lines of code.

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string).

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.


2 Answers

Lua does not have regular expressions, mainly because a full regular expression library would be bigger than Lua itself.

What Lua has instead are matching patterns, which are way less powerful (but still sufficient for many use cases):

  • There is no "word boundary" matcher,
  • no alternatives,
  • and also no lookahead or similar.

I think there is no Lua pattern which would match every possible occurrence of your string, and no other one, which means that you somehow must work around this.

The pattern proposed by Stuart, %d*%.?%d*, matches all decimal numbers (with or without a dot), but it also matches the empty string, which is not quite useful. %d+%.?%d* matches all decimal numbers with at least one digit before the dot (or without a dot), %d*%d.?%d+ matches all decimal numbers with at least one digit after the dot (or without a dot). %.%d+ matches decimal numbers without a digit before the dot.

A simple solution would be to search more than one of these patterns (for example, both %d+%.?%d* and %.%d+), and combine the results. Then look at the places where you found them and look if there is a ']' following them.


I experimented a bit with the frontier pattern.

The pattern %f[%.%d]%d*%.?%d*%f[^%.%d%]] matches all decimal numbers which are preceded by something that is neither digit nor dot (or by nothing), and followed by something that is neither ] nor digit nor dot (or by nothing). It also matches the single dot, though.

like image 182
Paŭlo Ebermann Avatar answered Oct 11 '22 16:10

Paŭlo Ebermann


"%d*%.?%d+" will match all such numbers in decimal format (note that that's going to miss any signed numbers such as -1.1 or +3.14). You'll need to come up with another solution to avoid instances that end with ], such as removing them from the string before looking for the numbers:

local pattern = "%d*%.?%d+"
local clean = string.gsub(orig ,pattern .. "%]", "")
return string.gmatch(clean, pattern)
like image 37
Stuart P. Bentley Avatar answered Oct 11 '22 17:10

Stuart P. Bentley