Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LuaJIT 2 optimization guide

Tags:

profiling

lua

I'm looking for a good guide on how to optimize Lua code for LuaJIT 2. It should focus on LJ2 specifics, like how to detect which traces are being compiled and which are not, etc.

Any pointers? Collection of links to Lua ML posts would do as an answer (bonus points for summarizing these links here.)

Update: I've changed the title text from 'profiling' to 'optimization' guide, as this makes more sense.

like image 966
Alexander Gladysh Avatar asked Aug 23 '11 21:08

Alexander Gladysh


3 Answers

Update

Mike has recently created and release a wonderful light-weight profiler for LuaJIT, you can find it here.

Update

The wiki has gained a few more pages in this area, especially this one, which details some extra stuff not mentioned in the original answer, and is based on a mailing list post by Mike.


LuaJIT very recently launched its own wiki and mailing list, and with such things comes many, many more gems about speeding up code for LuaJIT.

Right now the wiki is pretty thin (but is always looking for people to add to it), however, one great page that was added recently is a list of NYI functions. NYI functions cause the JIT to bail out and fallback to the interpreter, so quite obviously one should avoid NYI functions as much as possible on the hotpath, especially in loops.

Some topics of interest from the mailing list:

  • FFI Array Performance
  • An interesting discussion on GC gotcha's
  • Some more small gotcha's

And just to repeat whats said further down (cause its just that helpful), -jv is the best tool for performance tuning, it should also be your first stop when troubleshooting.


Original Answer

I doubt you'll find much on this actually, mainly cause LJ2 is still in beta, and thus most profiles are done naively as there are no debug hooks for LJ2 specific things like the trace recorder.

On the plus side, the new FFI module does allow direct calls to high resolution timers (or profiling APIs like VTune/CodeAnalyst), you can profile that way, but anything more requires extensions to the LJ2 JIT core, which should not be too hard, as the code is clear and commented.


One the trace recorder command line params (taken from here):

The -jv and -jdump commands are extension modules written in Lua. They are mainly used for debugging the JIT compiler itself. For a description of their options and output format, please read the comment block at the start of their source. They can be found in the lib directory of the source distribution or installed under the jit directory. By default this is /usr/local/share/luajit-2.0.0-beta8/jit on POSIX systems.

which means you can use the module code from the commands to form the based of a profiling module for LuaJIT 2.


Update

With the update to the question, this becomes a little easier to answer. So lets start from the source, LuaJIT.org:

before manually optimizing code, its always a good idea to check the JIT's optimization tuning resources:

Compilation

From the Running page we can see all of the options for setting the JIT's parameters, for optimization, we focus on the -O option. Immediately Mike tells us that enabling all optimizations has minimal performance impact, so make sure to run in -O3 (which is now the default), thus the only options here of real value to us are the JIT and Trace thresholds.

These options are very specific to the code your are writing, thus there aren't generic 'optimal settings' apart from the defaults, but needless to say, if your code has many loops, experiment with the loop unrolling and time the execution time (but flush the cache between each run if you are looking for cold start performance).

-jv is also useful in helping avoid the know issues/'fallbacks' that will cause the JIT to bailout.

The site itself doesn't offer much on how to write better or more optimized code, except for some small tidbits in the FFI tutorial:

Function Caching

Caching of functions is a good performance booster in Lua, but less important to focus on in LuaJIT, as the JIT does most of these optimizations itself, it is important to note that caching of FFI C functions is bad, it is preferred to cache the namespace that they reside in.

An example from the page:

bad:

local funca, funcb = ffi.C.funcb, ffi.C.funcb -- Not helpful!
local function foo(x, n)
  for i=1,n do funcb(funca(x, i), 1) end
end

good:

local C = ffi.C          -- Instead use this!
local function foo(x, n)
  for i=1,n do C.funcb(C.funca(x, i), 1) end
end

FFI Performance Issues

the Status section details various constructs & operations that degrade the performance of code (mainly because they aren't compiled, but use the VM fallback instead).

Now we move onto the source for all LuaJIT gems, the Lua mailing list:

  • Avoiding C Calls and NYI Lua calls in loops: if you want the LJ2 tracer to kick in and give useful feedback, you need to avoid the NYI(not yet implement) functions or C calls where the trace compiler cannot go. So if you have any small C calls that can be imported into lua and are used in loops, import them, at worst they might be '6% slower' than the C compiler implementation, at best its faster.

  • Use linear arrays over ipairs: according to Mike, pairs/next will always be slower compared to other methods (there is also a smal tidbit in there about symbol caching for the tracer).

  • Avoid nested loops: each nesting level takes an extra pass to trace, and will be slightly less optimized, specifically avoid inner loops with lower iterations.

  • You can use 0-base arrays: Mike says here that LuaJIT has no performance penalty for 0 based arrays, unlike standard Lua.

  • Declare locals in the most inner scope as possible: there is no real explanation why, but IIRC this has to do with SSA liveliness analysis. also contains some interesting info on how to avoid too many locals (which break's liveliness analysis).

  • Avoid lots of tiny loops: this messes up the unrolling heuristics and will slow down the code.

Smaller Tidbits:

  • FFI Tips
  • as of beta8 we have bytecode, which can be used to improve cold start performance by removing the parsing step.
  • Some general guidelines on FFI types from Mike
  • Reuse metatables as much as possible
  • Speeding up Lua -> C -> Lua looping
  • LJ2 doesn't have Lua's metatable bugs and avoids proxy tables, to it caches properly.
  • update to the Git HEAD when ever possible, it almsot always contains fixes and speed ups.

Profiling tools are available for normal Lua, however, there is one newer project that is officially compatible with LuaJIT (I doubt it'll take any of the LuaJIT features in account though), luatrace. The Lua wiki also has a page on optimization tips for normal Lua, these would need to be tested for their effectiveness under LuaJIT (most of these optimizations are probably performed internally already), however, LuaJIT still uses the default GC, this leaves it as one area where manual optimization gains can still be great (until Mike adds the custom GC he's mentioned doing here and there).

LuaJIT's source contains a few settings for fiddling with the internals of the JIT, however, these would require extensive testing to tune them for one's specific code, in fact, it might just be better to avoid them entirely, especially for those who aren't familiar with the internals of the JIT.

like image 114
Necrolis Avatar answered Nov 07 '22 05:11

Necrolis


Not quite what you're looking for, but I've managed some reverse-engineering of the jit.* tracing facilities. What follows is a bit rough, inaccurate, subject to change and very incomplete. I'll start making use of it in luatrace soon. These functions are used in several of the -j library files. dump.lua is probably a good place to start.

jit.attach

You can attach callbacks to a number of compiler events with jit.attach. The callback can be called:

  • when a function has been compiled to bytecode ("bc");
  • when trace recording starts or stops ("trace");
  • as a trace is being recorded ("record");
  • or when a trace exits through a side exit ("texit").

Set a callback with jit.attach(callback, "event") and clear the same callback with jit.attach(callback)

The arguments passed to the callback depend on the event being reported:

  • "bc": callback(func). func is the function that's just been recorded.
  • "trace": callback(what, tr, func, pc, otr, oex)
    • what is a description of the trace event: "flush", "start", "stop", "abort". Available for all events.
    • tr is the trace number. Not available for flush.
    • func is the function being traced. Available for start and abort.
    • pc is the program counter - the bytecode number of the function being recorded (if this a Lua function). Available for start and abort.
    • otr start: the parent trace number if this is a side trace, abort: abort code (integer)?
    • oex start: the exit number for the parent trace, abort: abort reason (string)
  • "record": callback(tr, func, pc, depth). The first arguments are the same as for trace start. depth is the depth of the inlining of the current bytecode.
  • "texit": callback(tr, ex, ngpr, nfpr).
    • tr is the trace number as before
    • ex is the exit number
    • ngpr and nfpr are the number of general-purpose and floating point registers that are active at the exit.

jit.util.funcinfo(func, pc)

When passed func and pc from a jit.attach callback, jit.util.funcinfo returns a table of information about the function, much like debug.getinfo.

The fields of the table are:

  • linedefined: as for debug.getinfo
  • lastlinedefined: as for debug.getinfo
  • params: the number of parameters the function takes
  • stackslots: the number of stack slots the function's local variable use
  • upvalues: the number of upvalues the function uses
  • bytecodes: the number of bytecodes it the compiled function
  • gcconsts: ??
  • nconsts: ??
  • currentline: as for debug.getinfo
  • isvararg: if the function is a vararg function`
  • source: as for debug.getinfo
  • loc: a string describing the source and currentline, like "<source>:<line>"
  • ffid: the fast function id of the function (if it is one). In this case only upvalues above and addr below are valid
  • addr: the address of the function (if it is not a Lua function). If it's a C function rather than a fast function, only upvalues above is valid
like image 23
Geoff Avatar answered Nov 07 '22 04:11

Geoff


I've used ProFi in the past and found it rather useful!

like image 2
tommitytom Avatar answered Nov 07 '22 04:11

tommitytom