Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split string with specified delimiter in lua

Tags:

regex

lua

I'm trying to create a split() function in lua with delimiter by choice, when the default is space. the default is working fine. The problem starts when I give a delimiter to the function. For some reason it doesn't return the last sub string. The function:

function split(str,sep)
if sep == nil then
    words = {}
    for word in str:gmatch("%w+") do table.insert(words, word) end
    return words
end
return {str:match((str:gsub("[^"..sep.."]*"..sep, "([^"..sep.."]*)"..sep)))} -- BUG!! doesnt return last value
end

I try to run this:

local str = "a,b,c,d,e,f,g"
local sep = ","
t = split(str,sep)
for i,j in ipairs(t) do
    print(i,j)
end

and I get:

1   a
2   b
3   c
4   d
5   e
6   f

Can't figure out where the bug is...

like image 859
DrorNohi Avatar asked Oct 20 '16 08:10

DrorNohi


3 Answers

When splitting strings, the easiest way to avoid corner cases is to append the delimiter to the string, when you know the string cannot end with the delimiter:

str = "a,b,c,d,e,f,g"
str = str .. ','
for w in str:gmatch("(.-),") do print(w) end

Alternatively, you can use a pattern with an optional delimiter:

str = "a,b,c,d,e,f,g"
for w in str:gmatch("([^,]+),?") do print(w) end

Actually, we don't need the optional delimiter since we're capturing non-delimiters:

str = "a,b,c,d,e,f,g"
for w in str:gmatch("([^,]+)") do print(w) end
like image 196
lhf Avatar answered Oct 19 '22 05:10

lhf


Here's my go-to split() function:

-- split("a,b,c", ",") => {"a", "b", "c"}
function split(s, sep)
    local fields = {}
    
    local sep = sep or " "
    local pattern = string.format("([^%s]+)", sep)
    string.gsub(s, pattern, function(c) fields[#fields + 1] = c end)
    
    return fields
end
like image 29
nicolas.leblanc Avatar answered Oct 19 '22 05:10

nicolas.leblanc


"[^"..sep.."]*"..sep This is what causes the problem. You are matching a string of characters which are not the separator followed by the separator. However, the last substring you want to match (g) is not followed by the separator character.

The quickest way to fix this is to also consider \0 a separator ("[^"..sep.."\0]*"..sep), as it represents the beginning and/or the end of the string. This way, g, which is not followed by a separator but by the end of the string would still be considered a match.

I'd say your approach is overly complicated in general; first of all you can just match individual substrings that do not contain the separator; secondly you can do this in a for-loop using the gmatch function

local result = {}
for field in your_string:gsub(("[^%s]+"):format(your_separator)) do
  table.insert(result, field)
end
return result

EDIT: The above code made a bit more simple:

local pattern = "[^%" .. your_separator .. "]+"
for field in string.gsub(your_string, pattern) do
-- ...and so on (The rest should be easy enough to understand)

EDIT2: Keep in mind that you should also escape your separators. A separator like % could cause problems if you don't escape it as %%

function escape(str)
  return str:gsub("([%^%$%(%)%%%.%[%]%*%+%-%?])", "%%%1")
end
like image 30
DarkWiiPlayer Avatar answered Oct 19 '22 04:10

DarkWiiPlayer