In a blog post from not too long ago, Scott Vokes describes a technical problem associated to lua's implementation of coroutines using the C functions setjmp
and longjmp
:
The main limitation of Lua coroutines is that, since they are implemented with setjmp(3) and longjmp(3), you cannot use them to call from Lua into C code that calls back into Lua that calls back into C, because the nested longjmp will clobber the C function’s stack frames. (This is detected at runtime, rather than failing silently.)
I haven’t found this to be a problem in practice, and I’m not aware of any way to fix it without damaging Lua’s portability, one of my favorite things about Lua — it will run on literally anything with an ANSI C compiler and a modest amount of space. Using Lua means I can travel light. :)
I have used coroutines a fair amount and I thought I understood broadly what was going on and what setjmp
and longjmp
do, however I read this at some point and realized that I didn't really understand it. To try to figure it out, I tried to make a program that I thought should cause a problem based on the description, and instead it seems to work fine.
However there are a few other places that I've seen people seem to allege that there are problems:
The question is:
Here was the code which I produced. In my test, it is linked with lua 5.3.1, compiled as C code, and the test itself is compiled itself as C++ code at C++11 standard.
extern "C" {
#include <lauxlib.h>
#include <lua.h>
}
#include <cassert>
#include <iostream>
#define CODE(C) \
case C: { \
std::cout << "When returning to " << where << " got code '" #C "'" << std::endl; \
break; \
}
void handle_resume_code(int code, const char * where) {
switch (code) {
CODE(LUA_OK)
CODE(LUA_YIELD)
CODE(LUA_ERRRUN)
CODE(LUA_ERRMEM)
CODE(LUA_ERRERR)
default:
std::cout << "An unknown error code in " << where << std::endl;
}
}
int trivial(lua_State *, int, lua_KContext) {
std::cout << "Called continuation function" << std::endl;
return 0;
}
int f(lua_State * L) {
std::cout << "Called function 'f'" << std::endl;
return 0;
}
int g(lua_State * L) {
std::cout << "Called function 'g'" << std::endl;
lua_State * T = lua_newthread(L);
lua_getglobal(T, "f");
handle_resume_code(lua_resume(T, L, 0), __func__);
return lua_yieldk(L, 0, 0, trivial);
}
int h(lua_State * L) {
std::cout << "Called function 'h'" << std::endl;
lua_State * T = lua_newthread(L);
lua_getglobal(T, "g");
handle_resume_code(lua_resume(T, L, 0), __func__);
return lua_yieldk(L, 0, 0, trivial);
}
int main () {
std::cout << "Starting:" << std::endl;
lua_State * L = luaL_newstate();
// init
{
lua_pushcfunction(L, f);
lua_setglobal(L, "f");
lua_pushcfunction(L, g);
lua_setglobal(L, "g");
lua_pushcfunction(L, h);
lua_setglobal(L, "h");
}
assert(lua_gettop(L) == 0);
// Some action
{
lua_State * T = lua_newthread(L);
lua_getglobal(T, "h");
handle_resume_code(lua_resume(T, nullptr, 0), __func__);
}
lua_close(L);
std::cout << "Bye! :-)" << std::endl;
}
The output I get is:
Starting:
Called function 'h'
Called function 'g'
Called function 'f'
When returning to g got code 'LUA_OK'
When returning to h got code 'LUA_YIELD'
When returning to main got code 'LUA_YIELD'
Bye! :-)
Much thanks to @ Nicol Bolas for the very detailed answer!
After reading his answer, reading the official docs, reading some emails and playing around with it some more, I want to refine the question / ask a specific follow-up question, however you want to look at it.
I think this term 'clobbering' is not good for describing this issue and this was part of what confused me -- nothing is being "clobbered" in the sense of being written to twice and the first value being lost, the issue is solely, as @Nicol Bolas points out, that longjmp
tosses part of the C stack, and if you are hoping to restore the stack later, too bad.
The issue is actually described very nicely in section 4.7 of lua 5.2 manual, in a link provided by @Nicol Bolas.
Curiously, there is no equivalent section in the lua 5.1 documentation. However, lua 5.2 has this to say about lua_yieldk
:
Yields a coroutine.
This function should only be called as the return expression of a C function, as follows:
return lua_yieldk (L, n, i, k);
Lua 5.1 manual says something similar, about lua_yield
instead:
Yields a coroutine.
This function should only be called as the return expression of a C function, as follows:
return lua_yieldk (L, n, i, k);
Some natural questions then:
return
here or not? If lua_yieldk
will call longjmp
then the lua_yieldk
will never return anyways, so it shouldn't matter if I return then? So that cannot be what is happening, right?lua_yieldk
just makes a note within the lua state that the current C api call has stated that it wants to yield, and then when it finally does return, lua will figure out what happens next. Then this solves the problem of saving C stack frames, no? Since after we return to lua normally, those stack frames have expired anyways -- so the complications described in @Nicol Bolas picture are skirted around? And second of all, in 5.2 at least the semantics are never that we should restore C stack frames, it seems -- lua_yieldk
resumes to a continuation function, not to the lua_yieldk
caller, and lua_yield
apparently resumes to the caller of the current api call, not to the lua_yield
caller itself.And, the most important question:
If I consistently use
lua_yieldk
in the formreturn lua_yieldk(...)
specified in the docs, returning from alua_CFunction
that was passed to lua, is it still possible to trigger theattempt to yield across a C-call boundary
error?
Finally, (but this is less important), I would like to see a concrete example of what it looks like when a naive programmer "isn't careful" and triggers the attempt to yield across a C-call boundary
error. I get the idea that there could be problem associated to setjmp
and longjmp
tossing stack frames that we later need, but I want to see some real lua / lua c api code that I can point to and say "for instance, don't do that", and this is surprisingly elusive.
I found this email where someone reported this error with some lua 5.1 code, and I attempted to reproduce it in lua 5.3. However what I found was that, this looks like just poor error reporting from the lua implementation -- the actual bug is being caused because the user is not setting up their coroutine properly. The proper way to load the coroutine is, create the thread, push a function onto the thread stack, and then call lua_resume
on the thread state. Instead the user was using dofile
on the thread stack, which executes the function there after loading it, rather than resuming it. So it is effectively yield outside of a coroutine
iiuc, and when I patch this, his code works fine, using both lua_yield
and lua_yieldk
in lua 5.3.
Here is the listing I produced:
#include <cassert>
#include <cstdio>
extern "C" {
#include "lua.h"
#include "lauxlib.h"
}
//#define USE_YIELDK
bool running = true;
int lua_print(lua_State * L) {
if (lua_gettop(L)) {
printf("lua: %s\n", lua_tostring(L, -1));
}
return 0;
}
int lua_finish(lua_State *L) {
running = false;
printf("%s called\n", __func__);
return 0;
}
int trivial(lua_State *, int, lua_KContext) {
printf("%s called\n", __func__);
return 0;
}
int lua_sleep(lua_State *L) {
printf("%s called\n", __func__);
#ifdef USE_YIELDK
printf("Calling lua_yieldk\n");
return lua_yieldk(L, 0, 0, trivial);
#else
printf("Calling lua_yield\n");
return lua_yield(L, 0);
#endif
}
const char * loop_lua =
"print(\"loop.lua\")\n"
"\n"
"local i = 0\n"
"while true do\n"
" print(\"lua_loop iteration\")\n"
" sleep()\n"
"\n"
" i = i + 1\n"
" if i == 4 then\n"
" break\n"
" end\n"
"end\n"
"\n"
"finish()\n";
int main() {
lua_State * L = luaL_newstate();
lua_pushcfunction(L, lua_print);
lua_setglobal(L, "print");
lua_pushcfunction(L, lua_sleep);
lua_setglobal(L, "sleep");
lua_pushcfunction(L, lua_finish);
lua_setglobal(L, "finish");
lua_State* cL = lua_newthread(L);
assert(LUA_OK == luaL_loadstring(cL, loop_lua));
/*{
int result = lua_pcall(cL, 0, 0, 0);
if (result != LUA_OK) {
printf("%s error: %s\n", result == LUA_ERRRUN ? "Runtime" : "Unknown", lua_tostring(cL, -1));
return 1;
}
}*/
// ^ This pcall (predictably) causes an error -- if we try to execute the
// script, it is going to call things that attempt to yield, but we did not
// start the script with lua_resume, we started it with pcall, so it's not
// okay to yield.
// The reported error is "attempt to yield across a C-call boundary", but what
// is really happening is just "yield from outside a coroutine" I suppose...
while (running) {
int status;
printf("Waking up coroutine\n");
status = lua_resume(cL, L, 0);
if (status == LUA_YIELD) {
printf("coroutine yielding\n");
} else {
running = false; // you can't try to resume if it didn't yield
if (status == LUA_ERRRUN) {
printf("Runtime error: %s\n", lua_isstring(cL, -1) ? lua_tostring(cL, -1) : "(unknown)" );
lua_pop(cL, -1);
break;
} else if (status == LUA_OK) {
printf("coroutine finished\n");
} else {
printf("Unknown error\n");
}
}
}
lua_close(L);
printf("Bye! :-)\n");
return 0;
}
Here is the output when USE_YIELDK
is commented out:
Waking up coroutine
lua: loop.lua
lua: lua_loop iteration
lua_sleep called
Calling lua_yield
coroutine yielding
Waking up coroutine
lua: lua_loop iteration
lua_sleep called
Calling lua_yield
coroutine yielding
Waking up coroutine
lua: lua_loop iteration
lua_sleep called
Calling lua_yield
coroutine yielding
Waking up coroutine
lua: lua_loop iteration
lua_sleep called
Calling lua_yield
coroutine yielding
Waking up coroutine
lua_finish called
coroutine finished
Bye! :-)
Here is the output when USE_YIELDK
is defined:
Waking up coroutine
lua: loop.lua
lua: lua_loop iteration
lua_sleep called
Calling lua_yieldk
coroutine yielding
Waking up coroutine
trivial called
lua: lua_loop iteration
lua_sleep called
Calling lua_yieldk
coroutine yielding
Waking up coroutine
trivial called
lua: lua_loop iteration
lua_sleep called
Calling lua_yieldk
coroutine yielding
Waking up coroutine
trivial called
lua: lua_loop iteration
lua_sleep called
Calling lua_yieldk
coroutine yielding
Waking up coroutine
trivial called
lua_finish called
coroutine finished
Bye! :-)
Think about what happens when a coroutine does a yield
. It stops executing, and processing returns to whomever it was that called resume
on that coroutine, correct?
Well, let's say you have this code:
function top()
coroutine.yield()
end
function middle()
top()
end
function bottom()
middle()
end
local co = coroutine.create(bottom);
coroutine.resume(co);
At the moment of the call to yield
, the Lua stack looks like this:
-- top
-- middle
-- bottom
-- yield point
When you call yield
, the Lua call stack that is part of the coroutine is preserved. When you do resume
, the preserved call stack is executed again, starting where it left off before.
OK, now let's say that middle
was in fact not a Lua function. Instead, it was a C function, and that C function calls the Lua function top
. So conceptually, your stack looks like this:
-- Lua - top
-- C - middle
-- Lua - bottom
-- Lua - yield point
Now, please note what I said before: this is what your stack looks like conceptually.
Because your actual call stack looks nothing like this.
In reality, there are really two stacks. There is Lua's internal stack, defined by a lua_State
. And there's C's stack. Lua's internal stack, at the time when yield
is about to be called, looks something like this:
-- top
-- Some C stuff
-- bottom
-- yield point
So what does the stack look like to C? Well, it looks like this:
-- arbitrary Lua interpreter stuff
-- middle
-- arbitrary Lua interpreter stuff
-- setjmp
And that right there is the problem. See, when Lua does a yield
, it's going to call longjmp
. That function is based on the behavior of the C stack. Namely, it's going to return to where setjmp
was.
The Lua stack will be preserved because the Lua stack is separate from the C stack. But the C stack? Everything between the longjmp
and setjmp
?. Gone. Kaput. Lost forever.
Now you may go, "wait, doesn't the Lua stack know that it went into C and back into Lua"? A bit. But the Lua stack is incapable of doing something that C is incapable of. And C is simply not capable of preserving a stack (well, not without special libraries). So while the Lua stack is vaguely aware that some kind of C process happened in the middle of its stack, it has no way to reconstitute what was there.
So what happens if you resume this yield
ed coroutine?
Nasal demons. And nobody likes those. Fortunately, Lua 5.1 and above (at least) will error whenever you attempt to yield across C.
Note that Lua 5.2+ does have ways of fixing this. But it's not automatic; it requires explicit coding on your part.
When Lua code that is in a coroutine calls your C code, and your C code calls Lua code that may yield, you can use lua_callk
or lua_pcallk
to call the possibly-yielding Lua functions. These calling functions take an extra parameter: a "continuation" function.
If the Lua code you call does yield, then the lua_*callk
function won't ever actually return (since your C stack will have been destroyed). Instead, it will call the continuation function you provided in your lua_*callk
function. As you can guess by the name, the continuation function's job is to continue where your previous function left off.
Now, Lua does preserve the stack for your continuation function, so it gets the stack in the same state that your original C function was in. Well, except that the function+arguments that you called (with lua_*callk
) are removed, and the return values from that function are pushed onto your stack. Outside of that, the stack is all the same.
There is also lua_yieldk
. This allows your C function to yield back to Lua, such that when the coroutine is resumed, it calls the provided continuation function.
Note that Coco gives Lua 5.1 the ability to resolve this problem. It is capable (though OS/assembly/etc magic) of preserving the C stack during a yield operation. LuaJIT versions before 2.0 also provided this feature.
C++ note
You marked your question with the C++ tag, so I'll assume that's involved here.
Among the many differences between C and C++ is the fact that C++ is far more dependent on the nature of its callstack than Lua. In C, if you discard a stack, you might lose resources that weren't cleaned up. C++ however is required to call destructors of functions declared on the stack at some point. The standard does not allow you to just throw them away.
So continuations only work in C++ if there is nothing on the stack which needs to have a destructor call. Or more specifically, only types that are trivially destructible can be sitting on the stack if you call any of the continuation function Lua APIs.
Of course, Coco handles C++ just fine, since it's actually preserving the C++ stack.
Posting this as an answer which complements @Nicol Bolas' answer, and so that I can have space to write down what it took for me to understand the original question, and the answers to the secondary questions / a code listing.
If you read Nicol Bolas' answer but still have questions like I did, here are some additional hints:
lua_callk
, lua_pcallk
, which allow you to provide a substitute
function which can be called in place of that C function whose frames were
wiped out.return lua_yieldk(...)
appears to have nothing to do with
any of this. From skimming the implementation of lua_yieldk
it appears that
it does indeed always longjmp
, and it may only return in some obscure case
involving lua debugging hooks (?).nny
(number non-yieldable) associated
to the lua state, and when you call lua_call
or lua_pcall
from a C api
function (a lua_CFunction
which you earlier pushed to lua), nny
is
incremented, and is only decremented when that call or pcall returns. When
nny
is nonzero, it is not safe to yield, and you get this yield across
C-api boundary
error if you try to yield anyways.Here is a simple listing that produces the problem and reports the errors,
if you are like me and like to have a concrete code examples. It demonstrates
some of the difference in using lua_call
, lua_pcall
, and lua_pcallk
within a function called by a coroutine.
extern "C" {
#include <lauxlib.h>
#include <lua.h>
}
#include <cassert>
#include <iostream>
//#define USE_PCALL
//#define USE_PCALLK
#define CODE(C) \
case C: { \
std::cout << "When returning to " << where << " got code '" #C "'" << std::endl; \
break; \
}
#define ERRCODE(C) \
case C: { \
std::cout << "When returning to " << where << " got code '" #C "': " << lua_tostring(L, -1) << std::endl; \
break; \
}
int report_resume_code(int code, const char * where, lua_State * L) {
switch (code) {
CODE(LUA_OK)
CODE(LUA_YIELD)
ERRCODE(LUA_ERRRUN)
ERRCODE(LUA_ERRMEM)
ERRCODE(LUA_ERRERR)
default:
std::cout << "An unknown error code in " << where << ": " << lua_tostring(L, -1) << std::endl;
}
return code;
}
int report_pcall_code(int code, const char * where, lua_State * L) {
switch(code) {
CODE(LUA_OK)
ERRCODE(LUA_ERRRUN)
ERRCODE(LUA_ERRMEM)
ERRCODE(LUA_ERRERR)
default:
std::cout << "An unknown error code in " << where << ": " << lua_tostring(L, -1) << std::endl;
}
return code;
}
int trivial(lua_State *, int, lua_KContext) {
std::cout << "Called continuation function" << std::endl;
return 0;
}
int f(lua_State * L) {
std::cout << "Called function 'f', yielding" << std::endl;
return lua_yield(L, 0);
}
int g(lua_State * L) {
std::cout << "Called function 'g'" << std::endl;
lua_getglobal(L, "f");
#ifdef USE_PCALL
std::cout << "pcall..." << std::endl;
report_pcall_code(lua_pcall(L, 0, 0, 0), __func__, L);
// ^ yield across pcall!
// If we yield, there is no way ever to return normally from this pcall,
// so it is an error.
#elif defined(USE_PCALLK)
std::cout << "pcallk..." << std::endl;
report_pcall_code(lua_pcallk(L, 0, 0, 0, 0, trivial), __func__, L);
#else
std::cout << "call..." << std::endl;
lua_call(L, 0, 0);
// ^ yield across call!
// This results in an error being reported in lua_resume, rather than at
// the pcall
#endif
return 0;
}
int main () {
std::cout << "Starting:" << std::endl;
lua_State * L = luaL_newstate();
// init
{
lua_pushcfunction(L, f);
lua_setglobal(L, "f");
lua_pushcfunction(L, g);
lua_setglobal(L, "g");
}
assert(lua_gettop(L) == 0);
// Some action
{
lua_State * T = lua_newthread(L);
lua_getglobal(T, "g");
while (LUA_YIELD == report_resume_code(lua_resume(T, L, 0), __func__, T)) {}
}
lua_close(L);
std::cout << "Bye! :-)" << std::endl;
}
Example output:
call
Starting:
Called function 'g'
call...
Called function 'f', yielding
When returning to main got code 'LUA_ERRRUN': attempt to yield across a C-call boundary
Bye! :-)
pcall
Starting:
Called function 'g'
pcall...
Called function 'f', yielding
When returning to g got code 'LUA_ERRRUN': attempt to yield across a C-call boundary
When returning to main got code 'LUA_OK'
Bye! :-)
pcallk
Starting:
Called function 'g'
pcallk...
Called function 'f', yielding
When returning to main got code 'LUA_YIELD'
Called continuation function
When returning to main got code 'LUA_OK'
Bye! :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With