I asked a question about Lua perfromance, and on of the responses asked:
Have you studied general tips for keeping Lua performance high? i.e. know table creation and rather reuse a table than create a new one, use of 'local print=print' and such to avoid global accesses.
This is a slightly different question from Lua Patterns,Tips and Tricks because I'd like answers that specifically impact performance and (if possible) an explanation of why performance is impacted.
One tip per answer would be ideal.
Avoid globals Accessing them means you have to access a table index. While Lua has a pretty good hashtable implementation, it's still a lot slower than accessing a local variable. If you have to use globals, assign their value to a local variable, this is faster at the 2nd variable access.
Some related speed figures: Many parser functions run at ~1,250 per second (such as: #if #ifeq #ifexpr ) Short templates run at hundreds per second. Character-insertion templates can run at 2,400 per second, such as {{nb5}}.
Lua can be used in everyday applications to extend the existing functionality or create new features and functions. Some popular games, programs, and services that use Lua are Dark Souls, Fable II, Garry's Mod, Wireshark, VLC, Apache, and Nginx Web Servers.
However, lua itself, i.e. lua-without-JIT, is probably one of the fastest scripting language. lua is faster than Java-without-JIT. lua is faster than Javascript-without-JIT.
In response to some of the other answers and comments:
It is true that as a programmer you should generally avoid premature optimization. But. This is not so true for scripting languages where the compiler does not optimize much -- or at all.
So, whenever you write something in Lua, and that is executed very often, is run in a time-critical environment or could run for a while, it is a good thing to know things to avoid (and avoid them).
This is a collection of what I found out over time. Some of it I found out over the net, but being of a suspicious nature when the interwebs are concerned I tested all of it myself. Also, I have read the Lua performance paper at Lua.org.
Some reference:
This is one of the most common hints, but stating it once more can't hurt.
Globals are stored in a hashtable by their name. Accessing them means you have to access a table index. While Lua has a pretty good hashtable implementation, it's still a lot slower than accessing a local variable. If you have to use globals, assign their value to a local variable, this is faster at the 2nd variable access.
do x = gFoo + gFoo; end do -- this actually performs better. local lFoo = gFoo; x = lFoo + lFoo; end
(Not that simple testing may yield different results. eg. local x; for i=1, 1000 do x=i; end
here the for loop header takes actually more time than the loop body, thus profiling results could be distorted.)
Lua hashes all strings on creation, this makes comparison and using them in tables very fast and reduces memory use since all strings are stored internally only once. But it makes string creation more expensive.
A popular option to avoid excessive string creation is using tables. For example, if you have to assemble a long string, create a table, put the individual strings in there and then use table.concat
to join it once
-- do NOT do something like this local ret = ""; for i=1, C do ret = ret..foo(); end
If foo()
would return only the character A
, this loop would create a series of strings like ""
, "A"
, "AA"
, "AAA"
, etc. Each string would be hashed and reside in memory until the application finishes -- see the problem here?
-- this is a lot faster local ret = {}; for i=1, C do ret[#ret+1] = foo(); end ret = table.concat(ret);
This method does not create strings at all during the loop, the string is created in the function foo
and only references are copied into the table. Afterwards, concat creates a second string "AAAAAA..."
(depending on how large C
is). Note that you could use i
instead of #ret+1
but often you don't have such a useful loop and you won't have an iterator variable you can use.
Another trick I found somewhere on lua-users.org is to use gsub if you have to parse a string
some_string:gsub(".", function(m) return "A"; end);
This looks odd at first, the benefit is that gsub creates a string "at once" in C which is only hashed after it is passed back to lua when gsub returns. This avoids table creation, but possibly has more function overhead (not if you call foo()
anyway, but if foo()
is actually an expression)
Use language constructs instead of functions where possible
ipairs
When iterating a table, the function overhead from ipairs does not justify it's use. To iterate a table, instead use
for k=1, #tbl do local v = tbl[k];
It does exactly the same without the function call overhead (pairs actually returns another function which is then called for every element in the table while #tbl
is only evaluated once). It's a lot faster, even if you need the value. And if you don't...
Note for Lua 5.2: In 5.2 you can actually define a __ipairs
field in the metatable, which does make ipairs
useful in some cases. However, Lua 5.2 also makes the __len
field work for tables, so you might still prefer the above code to ipairs
as then the __len
metamethod is only called once, while for ipairs
you would get an additional function call per iteration.
table.insert
, table.remove
Simple uses of table.insert
and table.remove
can be replaced by using the #
operator instead. Basically this is for simple push and pop operations. Here are some examples:
table.insert(foo, bar); -- does the same as foo[#foo+1] = bar; local x = table.remove(foo); -- does the same as local x = foo[#foo]; foo[#foo] = nil;
For shifts (eg. table.remove(foo, 1)
), and if ending up with a sparse table is not desirable, it is of course still better to use the table functions.
You might - or might not - have decisions in your code like the following
if a == "C" or a == "D" or a == "E" or a == "F" then ... end
Now this is a perfectly valid case, however (from my own testing) starting with 4 comparisons and excluding table generation, this is actually faster:
local compares = { C = true, D = true, E = true, F = true }; if compares[a] then ... end
And since hash tables have constant look up time, the performance gain increases with every additional comparison. On the other hand if "most of the time" one or two comparisons match, you might be better off with the Boolean way or a combination.
This is discussed thoroughly in Lua Performance Tips. Basically the problem is that Lua allocates your table on demand and doing it this way will actually take more time than cleaning it's content and filling it again.
However, this is a bit of a problem, since Lua itself does not provide a method for removing all elements from a table, and pairs()
is not the performance beast itself. I have not done any performance testing on this problem myself yet.
If you can, define a C function that clears a table, this should be a good solution for table reuse.
This is the biggest problem, I think. While a compiler in a non-interpreted language can easily optimize away a lot of redundancies, Lua will not.
Using tables this can be done quite easily in Lua. For single-argument functions you can even replace them with a table and __index metamethod. Even though this destroys transparancy, performance is better on cached values due to one less function call.
Here is an implementation of memoization for a single argument using a metatable. (Important: This variant does not support a nil value argument, but is pretty damn fast for existing values.)
function tmemoize(func) return setmetatable({}, { __index = function(self, k) local v = func(k); self[k] = v return v; end }); end -- usage (does not support nil values!) local mf = tmemoize(myfunc); local v = mf[x];
You could actually modify this pattern for multiple input values
The idea is similar to memoization, which is to "cache" results. But here instead of caching the results of the function, you would cache intermediate values by putting their calculation in a constructor function that defines the calculation function in it's block. In reality I would just call it clever use of closures.
-- Normal function function foo(a, b, x) return cheaper_expression(expensive_expression(a,b), x); end -- foo(a,b,x1); -- foo(a,b,x2); -- ... -- Partial application function foo(a, b) local C = expensive_expression(a,b); return function(x) return cheaper_expression(C, x); end end -- local f = foo(a,b); -- f(x1); -- f(x2); -- ...
This way it is possible to easily create flexible functions that cache some of their work without too much impact on program flow.
An extreme variant of this would be Currying, but that is actually more a way to mimic functional programming than anything else.
Here is a more extensive ("real world") example with some code omissions, otherwise it would easily take up the whole page here (namely get_color_values
actually does a lot of value checking and recognizes accepts mixed values)
function LinearColorBlender(col_from, col_to) local cfr, cfg, cfb, cfa = get_color_values(col_from); local ctr, ctg, ctb, cta = get_color_values(col_to); local cdr, cdg, cdb, cda = ctr-cfr, ctg-cfg, ctb-cfb, cta-cfa; if not cfr or not ctr then error("One of given arguments is not a color."); end return function(pos) if type(pos) ~= "number" then error("arg1 (pos) must be in range 0..1"); end if pos < 0 then pos = 0; end; if pos > 1 then pos = 1; end; return cfr + cdr*pos, cfg + cdg*pos, cfb + cdb*pos, cfa + cda*pos; end end -- Call local blender = LinearColorBlender({1,1,1,1},{0,0,0,1}); object:SetColor(blender(0.1)); object:SetColor(blender(0.3)); object:SetColor(blender(0.7));
You can see that once the blender was created, the function only has to sanity-check a single value instead of up to eight. I even extracted the difference calculation, though it probably does not improve a lot, I hope it shows what this pattern tries to achieve.
If your lua program is really too slow, use the Lua profiler and clean up expensive stuff or migrate to C. But if you're not sitting there waiting, your time is wasted.
The first law of optimization: Don't.
I'd love to see a problem where you have a choice between ipairs and pairs and can measure the effect of the difference.
The one easy piece of low-hanging fruit is to remember to use local variables within each module. It's general not worth doing stuff like
local strfind = string.find
unless you can find a measurement telling you otherwise.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With