Background
I work with Watusimoto on the game Bitfighter. We use a variation of LuaWrapper to connect our c++ objects with Lua objects in the game. We also use a variation of Lua called lua-vec to speed up vector operations.
We have been working to solve a bug for some time that has eluded us. Random crashes will occur that suggest corrupt metatables. See here for Watusimoto's post on the issue. I'm not sure it is because of a corrupt metatable and have seen some really odd behavior about which I wish to ask here.
The Problem Manifestation
As an example, we create an object and add it to a level like this:
t = TextItem.new()
t:setText("hello")
levelgen:addItem(t)
However, the game will sometimes (not always) crash. With an error:
attempt to call missing or unknown method 'addItem' (a nil value)
Using a suggestion given in answer to Watusimoto's post mentioned above, I have changed the last line to the following:
local ok, res = pcall(function() levelgen:addItem(t) end)
if not ok then
local s = "Invalid levelgen value: "..tostring(levelgen).." "..type(levelgen).."\n"
for k, v in pairs(getmetatable(levelgen)) do
s = s.."meta "..tostring(k).." "..tostring(v).."\n"
end
error(res..s)
end
This prints out the metatable for levelgen
if something when wrong calling a method from it.
However, and this is crazy, when it fails and prints out the metatable, the metatable is exactly how it should be (with the correct addItem
call and everything). If I print the metatable for levelgen
upon script load, and when it fails using pcall
above, they are identical, every call and pointer to userdata is the same and as it should be.
It is as though the metatable for levelgen
is spontaneously disappearing at random.
Would anyone have any idea what is going on?
Thank you
Note: This doesn't happen with only the levelgen
object. For instance, it has happened on the TestItem
object mentioned above as well. In fact, that same code crashes on my computer at the line levelgen:addItem(t)
but crashes on another developer's computer with the line t:setText("hello")
with the same error message missing or unknown method 'setText' (a nil value)
As with any mystery, you will need to peel it off layer by layer. I recommend going through the same steps Lua is going and trying to detect where the path taken diverge from your expectations:
What does getmetatable(levelgen).__index
return? If it's a table, then check its content for addItem
. If it's a function, then try to call it with (table, "addItem")
and see what it returns.
Check if getmetatable
returns reference to the same object before and after the call (or when it fails).
Are there several levels of metatable indirection that the call is going through? If so, try to follow the same path with explicit calls and see where the differences are.
Are you using weak
keys that may cause values to disappear if there are no other references?
Can you provide a "default" value when you detect that it fails and continue to see if it "finds" this method again later? Or when it's broken, it's broken for every call after that?
What if you save a proper value for addItem and "fix" it when you detect it's broken?
What if you simply handle the error (as you do) and call it 10 times? Would it show valid results at least once (after it fails)? 100 times? If you keep calling the same method when it works, will it fail? This may help you to come up with a more reproducible error.
I'm not familiar with LuaWrapper to provide more specific questions, but these are the steps I'd take if I were you.
I strongly suspect the issue is that you have a class or struct similar to this:
struct Foo
{
Bar bar;
// Other fields follow
}
And that you've exposed both Foo and Bar to Lua via LuaWrapper. The important bit here is that bar
is the first field on your Foo
struct. Alternatively, you may have some class that inherits from some other base class and both the derived and base class are exposed to LuaWrapper.
LuaWrapper uses an function called an Identifier to uniquely track each object (like whether or not the given object has already been added to the Lua state). By default it uses the object address as a key. In cases like the one posed above it is possible that both Foo and Bar have the same address in memory, and thus LuaWrapper can get confused.
This may result in grabbing the wrong object's metatable when attempting to look up a method. Clearly, since it's looking at the wrong metatable it won't find the method you want, and so it will appear as if your metatable has mysteriously lost entries.
I've checked in a change that tracks each object's data per-type rather than in one giant pile. If you update your copy LuaWrapper to latest one from the repository I'm fairly certain your problem will be fixed.
After merging with upstream (commit 3c54015) LuaWrapper, this issue has disappeared. It appears to have been a bug in LuaWrapper.
Thanks Alex!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With