I'm trying to create a split() function in lua with delimiter by choice, when the default is space. the default is working fine. The problem starts when I give a delimiter to the function. For some reason it doesn't return the last sub string. The function:
function split(str,sep)
if sep == nil then
words = {}
for word in str:gmatch("%w+") do table.insert(words, word) end
return words
end
return {str:match((str:gsub("[^"..sep.."]*"..sep, "([^"..sep.."]*)"..sep)))} -- BUG!! doesnt return last value
end
I try to run this:
local str = "a,b,c,d,e,f,g"
local sep = ","
t = split(str,sep)
for i,j in ipairs(t) do
print(i,j)
end
and I get:
1 a
2 b
3 c
4 d
5 e
6 f
Can't figure out where the bug is...
When splitting strings, the easiest way to avoid corner cases is to append the delimiter to the string, when you know the string cannot end with the delimiter:
str = "a,b,c,d,e,f,g"
str = str .. ','
for w in str:gmatch("(.-),") do print(w) end
Alternatively, you can use a pattern with an optional delimiter:
str = "a,b,c,d,e,f,g"
for w in str:gmatch("([^,]+),?") do print(w) end
Actually, we don't need the optional delimiter since we're capturing non-delimiters:
str = "a,b,c,d,e,f,g"
for w in str:gmatch("([^,]+)") do print(w) end
Here's my go-to split() function:
-- split("a,b,c", ",") => {"a", "b", "c"}
function split(s, sep)
local fields = {}
local sep = sep or " "
local pattern = string.format("([^%s]+)", sep)
string.gsub(s, pattern, function(c) fields[#fields + 1] = c end)
return fields
end
"[^"..sep.."]*"..sep
This is what causes the problem. You are matching a string of characters which are not the separator followed by the separator. However, the last substring you want to match (g
) is not followed by the separator character.
The quickest way to fix this is to also consider \0
a separator ("[^"..sep.."\0]*"..sep
), as it represents the beginning and/or the end of the string. This way, g
, which is not followed by a separator but by the end of the string would still be considered a match.
I'd say your approach is overly complicated in general; first of all you can just match individual substrings that do not contain the separator; secondly you can do this in a for
-loop using the gmatch
function
local result = {}
for field in your_string:gsub(("[^%s]+"):format(your_separator)) do
table.insert(result, field)
end
return result
EDIT: The above code made a bit more simple:
local pattern = "[^%" .. your_separator .. "]+"
for field in string.gsub(your_string, pattern) do
-- ...and so on (The rest should be easy enough to understand)
EDIT2: Keep in mind that you should also escape your separators. A separator like %
could cause problems if you don't escape it as %%
function escape(str)
return str:gsub("([%^%$%(%)%%%.%[%]%*%+%-%?])", "%%%1")
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With