Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lua, c++ and disappearing metatables

Tags:

c++

lua

metatable

Background

I work with Watusimoto on the game Bitfighter. We use a variation of LuaWrapper to connect our c++ objects with Lua objects in the game. We also use a variation of Lua called lua-vec to speed up vector operations.

We have been working to solve a bug for some time that has eluded us. Random crashes will occur that suggest corrupt metatables. See here for Watusimoto's post on the issue. I'm not sure it is because of a corrupt metatable and have seen some really odd behavior about which I wish to ask here.

The Problem Manifestation

As an example, we create an object and add it to a level like this:

t = TextItem.new()
t:setText("hello")
levelgen:addItem(t)

However, the game will sometimes (not always) crash. With an error:

attempt to call missing or unknown method 'addItem' (a nil value)

Using a suggestion given in answer to Watusimoto's post mentioned above, I have changed the last line to the following:

local ok, res = pcall(function() levelgen:addItem(t) end)

if not ok then
    local s = "Invalid levelgen value: "..tostring(levelgen).." "..type(levelgen).."\n"

    for k, v in pairs(getmetatable(levelgen)) do 
        s = s.."meta "..tostring(k).." "..tostring(v).."\n"
    end

    error(res..s)
end

This prints out the metatable for levelgen if something when wrong calling a method from it.

However, and this is crazy, when it fails and prints out the metatable, the metatable is exactly how it should be (with the correct addItem call and everything). If I print the metatable for levelgen upon script load, and when it fails using pcall above, they are identical, every call and pointer to userdata is the same and as it should be.

It is as though the metatable for levelgen is spontaneously disappearing at random.

Would anyone have any idea what is going on?

Thank you

Note: This doesn't happen with only the levelgen object. For instance, it has happened on the TestItem object mentioned above as well. In fact, that same code crashes on my computer at the line levelgen:addItem(t) but crashes on another developer's computer with the line t:setText("hello") with the same error message missing or unknown method 'setText' (a nil value)

like image 908
raptor Avatar asked Feb 17 '13 01:02

raptor


3 Answers

As with any mystery, you will need to peel it off layer by layer. I recommend going through the same steps Lua is going and trying to detect where the path taken diverge from your expectations:

What does getmetatable(levelgen).__index return? If it's a table, then check its content for addItem. If it's a function, then try to call it with (table, "addItem") and see what it returns.

Check if getmetatable returns reference to the same object before and after the call (or when it fails).

Are there several levels of metatable indirection that the call is going through? If so, try to follow the same path with explicit calls and see where the differences are.

Are you using weak keys that may cause values to disappear if there are no other references?

Can you provide a "default" value when you detect that it fails and continue to see if it "finds" this method again later? Or when it's broken, it's broken for every call after that?

What if you save a proper value for addItem and "fix" it when you detect it's broken?

What if you simply handle the error (as you do) and call it 10 times? Would it show valid results at least once (after it fails)? 100 times? If you keep calling the same method when it works, will it fail? This may help you to come up with a more reproducible error.

I'm not familiar with LuaWrapper to provide more specific questions, but these are the steps I'd take if I were you.

like image 178
Paul Kulchenko Avatar answered Nov 05 '22 01:11

Paul Kulchenko


I strongly suspect the issue is that you have a class or struct similar to this:

struct Foo
{
    Bar bar;
    // Other fields follow
}

And that you've exposed both Foo and Bar to Lua via LuaWrapper. The important bit here is that bar is the first field on your Foo struct. Alternatively, you may have some class that inherits from some other base class and both the derived and base class are exposed to LuaWrapper.

LuaWrapper uses an function called an Identifier to uniquely track each object (like whether or not the given object has already been added to the Lua state). By default it uses the object address as a key. In cases like the one posed above it is possible that both Foo and Bar have the same address in memory, and thus LuaWrapper can get confused.

This may result in grabbing the wrong object's metatable when attempting to look up a method. Clearly, since it's looking at the wrong metatable it won't find the method you want, and so it will appear as if your metatable has mysteriously lost entries.

I've checked in a change that tracks each object's data per-type rather than in one giant pile. If you update your copy LuaWrapper to latest one from the repository I'm fairly certain your problem will be fixed.

like image 34
Alex Avatar answered Nov 05 '22 01:11

Alex


After merging with upstream (commit 3c54015) LuaWrapper, this issue has disappeared. It appears to have been a bug in LuaWrapper.

Thanks Alex!

like image 1
raptor Avatar answered Nov 05 '22 01:11

raptor