I'm looking for a good guide on how to optimize Lua code for LuaJIT 2. It should focus on LJ2 specifics, like how to detect which traces are being compiled and which are not, etc.
Any pointers? Collection of links to Lua ML posts would do as an answer (bonus points for summarizing these links here.)
Update: I've changed the title text from 'profiling' to 'optimization' guide, as this makes more sense.
Mike has recently created and release a wonderful light-weight profiler for LuaJIT, you can find it here.
The wiki has gained a few more pages in this area, especially this one, which details some extra stuff not mentioned in the original answer, and is based on a mailing list post by Mike.
LuaJIT very recently launched its own wiki and mailing list, and with such things comes many, many more gems about speeding up code for LuaJIT.
Right now the wiki is pretty thin (but is always looking for people to add to it), however, one great page that was added recently is a list of NYI functions. NYI functions cause the JIT to bail out and fallback to the interpreter, so quite obviously one should avoid NYI functions as much as possible on the hotpath, especially in loops.
Some topics of interest from the mailing list:
And just to repeat whats said further down (cause its just that helpful), -jv
is the best tool for performance tuning, it should also be your first stop when troubleshooting.
I doubt you'll find much on this actually, mainly cause LJ2 is still in beta, and thus most profiles are done naively as there are no debug hooks for LJ2 specific things like the trace recorder.
On the plus side, the new FFI module does allow direct calls to high resolution timers (or profiling APIs like VTune/CodeAnalyst), you can profile that way, but anything more requires extensions to the LJ2 JIT core, which should not be too hard, as the code is clear and commented.
One the trace recorder command line params (taken from here):
The -jv and -jdump commands are extension modules written in Lua. They are mainly used for debugging the JIT compiler itself. For a description of their options and output format, please read the comment block at the start of their source. They can be found in the lib directory of the source distribution or installed under the jit directory. By default this is /usr/local/share/luajit-2.0.0-beta8/jit on POSIX systems.
which means you can use the module code from the commands to form the based of a profiling module for LuaJIT 2.
With the update to the question, this becomes a little easier to answer. So lets start from the source, LuaJIT.org:
before manually optimizing code, its always a good idea to check the JIT's optimization tuning resources:
Compilation
From the Running page we can see all of the options for setting the JIT's parameters, for optimization, we focus on the -O
option. Immediately Mike tells us that enabling all optimizations has minimal performance impact, so make sure to run in -O3
(which is now the default), thus the only options here of real value to us are the JIT and Trace thresholds.
These options are very specific to the code your are writing, thus there aren't generic 'optimal settings' apart from the defaults, but needless to say, if your code has many loops, experiment with the loop unrolling and time the execution time (but flush the cache between each run if you are looking for cold start performance).
-jv
is also useful in helping avoid the know issues/'fallbacks' that will cause the JIT to bailout.
The site itself doesn't offer much on how to write better or more optimized code, except for some small tidbits in the FFI tutorial:
Function Caching
Caching of functions is a good performance booster in Lua, but less important to focus on in LuaJIT, as the JIT does most of these optimizations itself, it is important to note that caching of FFI C functions is bad, it is preferred to cache the namespace that they reside in.
An example from the page:
bad:
local funca, funcb = ffi.C.funcb, ffi.C.funcb -- Not helpful!
local function foo(x, n)
for i=1,n do funcb(funca(x, i), 1) end
end
good:
local C = ffi.C -- Instead use this!
local function foo(x, n)
for i=1,n do C.funcb(C.funca(x, i), 1) end
end
FFI Performance Issues
the Status section details various constructs & operations that degrade the performance of code (mainly because they aren't compiled, but use the VM fallback instead).
Now we move onto the source for all LuaJIT gems, the Lua mailing list:
Avoiding C Calls and NYI Lua calls in loops: if you want the LJ2 tracer to kick in and give useful feedback, you need to avoid the NYI(not yet implement) functions or C calls where the trace compiler cannot go. So if you have any small C calls that can be imported into lua and are used in loops, import them, at worst they might be '6% slower' than the C compiler implementation, at best its faster.
Use linear arrays over ipairs: according to Mike, pairs/next will always be slower compared to other methods (there is also a smal tidbit in there about symbol caching for the tracer).
Avoid nested loops: each nesting level takes an extra pass to trace, and will be slightly less optimized, specifically avoid inner loops with lower iterations.
You can use 0-base arrays: Mike says here that LuaJIT has no performance penalty for 0 based arrays, unlike standard Lua.
Declare locals in the most inner scope as possible: there is no real explanation why, but IIRC this has to do with SSA liveliness analysis. also contains some interesting info on how to avoid too many locals (which break's liveliness analysis).
Avoid lots of tiny loops: this messes up the unrolling heuristics and will slow down the code.
Smaller Tidbits:
Profiling tools are available for normal Lua, however, there is one newer project that is officially compatible with LuaJIT (I doubt it'll take any of the LuaJIT features in account though), luatrace. The Lua wiki also has a page on optimization tips for normal Lua, these would need to be tested for their effectiveness under LuaJIT (most of these optimizations are probably performed internally already), however, LuaJIT still uses the default GC, this leaves it as one area where manual optimization gains can still be great (until Mike adds the custom GC he's mentioned doing here and there).
LuaJIT's source contains a few settings for fiddling with the internals of the JIT, however, these would require extensive testing to tune them for one's specific code, in fact, it might just be better to avoid them entirely, especially for those who aren't familiar with the internals of the JIT.
Not quite what you're looking for, but I've managed some reverse-engineering of the jit.* tracing facilities. What follows is a bit rough, inaccurate, subject to change and very incomplete. I'll start making use of it in luatrace soon. These functions are used in several of the -j library files. dump.lua
is probably a good place to start.
jit.attach
You can attach callbacks to a number of compiler events with jit.attach
. The callback can be called:
Set a callback with jit.attach(callback, "event")
and clear the same callback with jit.attach(callback)
The arguments passed to the callback depend on the event being reported:
callback(func)
. func
is the function that's just been recorded.callback(what, tr, func, pc, otr, oex)
what
is a description of the trace event: "flush", "start", "stop", "abort". Available for all events.tr
is the trace number. Not available for flush.func
is the function being traced. Available for start and abort.pc
is the program counter - the bytecode number of the function being recorded (if this a Lua function). Available for start and abort.otr
start: the parent trace number if this is a side trace, abort: abort code (integer)?oex
start: the exit number for the parent trace, abort: abort reason (string) callback(tr, func, pc, depth)
. The first arguments are the same as for trace start. depth
is the depth of the inlining of the current bytecode.callback(tr, ex, ngpr, nfpr)
.
tr
is the trace number as beforeex
is the exit numberngpr
and nfpr
are the number of general-purpose and floating point registers that are active at the exit.jit.util.funcinfo(func, pc)
When passed func
and pc
from a jit.attach
callback,
jit.util.funcinfo
returns a table of information about the function,
much like debug.getinfo
.
The fields of the table are:
linedefined
: as for debug.getinfo
lastlinedefined
: as for debug.getinfo
params
: the number of parameters the function takesstackslots
: the number of stack slots the function's local variable useupvalues
: the number of upvalues the function usesbytecodes
: the number of bytecodes it the compiled functiongcconsts
: ??nconsts
: ??currentline
: as for debug.getinfo
isvararg
: if the function is a vararg function`source
: as for debug.getinfo
loc
: a string describing the source and currentline, like "<source>:<line>"ffid
: the fast function id of the function (if it is one). In this case only upvalues
above and addr
below are validaddr
: the address of the function (if it is not a Lua function). If it's a C function rather than a fast function, only upvalues
above is validI've used ProFi in the past and found it rather useful!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With