Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lua: How to look up in a table where the keys are tables (or objects)

I want to store a lua table where the keys are other lua tables. I know that this is possible BUT I want to be able to do look ups in the table using copies of those tables. Specifically, I want to be able to do:

t = {}
key = { a = "a" }
t[key] = 4
key2 = { a = "a" }

and then I want to be able to look up:

t[key2]

and get 4.

I know that I can turn key into a string and put it into table t. I've also thought about writing a custom hash function or doing this by nesting tables. Is there a best way for me to get this type of functionality? What other options do I have?

like image 461
akobre01 Avatar asked Feb 08 '12 21:02

akobre01


4 Answers

In Lua, two tables created separately are considered "different". But if you create a table once, you can assign it to any variables you want, and when you compare them, Lua will tell you that they are equal. In other words:

t = {}
key = { a = "a" }
t[key] = 4
key2 = key
...
t[key2] -- returns 4

So, that's the simple, clean way of doing what you want. Store key somewhere, so you can retrieve the 4 back by using it. This is also very fast.

If you really don't want to do that ... well, there is a way. But it is kindof inefficient and ugly.

The first part is making a function that compares two separate tables. It should return true if two tables are "equivalent", and false if they are not. Let's call it equivalent. It should work like this:

equivalent({a=1},{a=1})          -- true
equivalent({a=1,b=2}, {a=1})     -- false
equivalent({a={b=1}}, {a={b=2}}) -- false

The function must be recursive, to handle tables that contain tables themselves. It also must not be fooled if one of the tables "contains" the other, but has more elements. I came out with this implementation; probably there are better ones out there.

local function equivalent(a,b)
  if type(a) ~= 'table' then return a == b end

  local counta, countb = 0, 0

  for k,va in pairs(a) do
    if not equivalent(va, b[k]) then return false end
    counta = counta + 1
  end

  for _,_ in pairs(b) do countb = countb + 1 end

  return counta == countb
end

I'm not going to explain that function here. I hope it is clear enough what it does.

The other part of the puzzle consist on making t use the equivalent function when comparing keys. This can be done with careful metatable manipulation, and an extra "storage" table.

We basically transform t into an impostor. When our code tells it to store a value under a key, it doesn't save it in itself; instead it gives it to the extra table (we'll call that store). When the code asks t for a value, it searches for it in store, but using the equivalent function to get it.

This is the code:

local function equivalent(a,b)
... -- same code as before
end

local store = {} -- this is the table that stores the values

t = setmetatable({}, {
  __newindex = store,
  __index = function(tbl, key)
    for k,v in pairs(store) do
      if equivalent(k,key) then return v end
    end
  end
})

Usage example:

t[{a = 1}] = 4

print(t[{a = 1}]) -- 4
print(t[{a = 1, b = 2}]) -- nil
like image 57
kikito Avatar answered Nov 14 '22 17:11

kikito


kikito's answer is good, but has some flaws:

  • If you perform t[{a=1}] = true two times, store will contain two tables (leaking memory for the lifetime of the hash table)
  • Modifying the value once you've already stored it doesn't work, nor can you remove it. Attempting to change it will result in the retrieval potentailly returning any value you've assigned to that key in the past.
  • Access performance is O(n) (n being the number of stored entries and assuming that lua's value retrieval from a table is O(1)); combined with the first point, performance of this hash table will degrade with use

(Also note that kikito's "equivalent" function will cause an infinite loop if any table has a circular reference.)

If you never need to change/remove any information in the table, then kikito's answer will suffice as it stands. Otherwise, the metatable must be changed so that the __newindex makes sure that the table doesn't already exist:

t = setmetatable({}, {
    __newindex = function(tbl, key, value)
        for k,v in pairs(store) do
            if equivalent(k,key) then
                tbl[k] = value
                return
            end
        end
        store[key] = value
    end,
    __index = function(tbl, key)
        for k,v in pairs(store) do
            if equivalent(k, key) then return v end
        end
    end
})

As you've suggested, a completely different option is to write a custom hashing function. Here's a HashTable that can make use of that:

local function HashTable(Hash, Equals)
    --Hash is an optional function that takes in any key and returns a key that lua can use (string or number). If you return false/nil, it will be assumed that you don't know how to hash that value.
    --    If Hash is not provided, table-keys should have a GetHash function or a .Hash field
    --Equals is an optional function that takes two keys and specify whether they are equal or not. This will be used when the same hash is returned from two keys.
    --    If Equals is not provided, items should have a Equals function; items are in this case assumed to not be equal if they are different types.
    local items = {} --Dict<hash, Dict<key, value>>
    local function GetHash(item)
        return Hash and Hash(item) or type(item) == "table" and (item.GetHash and item:GetHash() or item.Hash) or item
    end
    local function GetEquals(item1, item2)
        if Equals then return Equals(item1, item2) end
        local t1, t2 = type(item1), type(item2)
        if t1 ~= t2 then return false end
        if t1 == "table" and item1.Equals then
            return item1:Equals(item2)
        elseif t2 == "table" and item2.Equals then
            return item2:Equals(item1)
        end
        return false
    end
    return setmetatable({}, {
        __newindex = function(_, key, value)
            local hash = GetHash(key)
            local dict = items[hash]
            if not dict then
                if value ~= nil then --Only generate a table if it will be non-empty after assignment
                    items[hash] = {[key] = value}
                end
                return
            end
            for k, v in pairs(dict) do
                if GetEquals(key, k) then --Found the entry; update it
                    dict[k] = value
                    if value == nil then --check to see if dict is empty
                        if next(dict) == nil then
                            items[hash] = nil
                        end
                    end
                    return
                end
            end
            --This is a unique entry
            dict[key] = value
        end,
        __index = function(_, key)
            local hash = GetHash(key)
            local dict = items[hash]
            if not dict then return nil end
            for k, v in pairs(dict) do
                if GetEquals(key, k) then
                    return v
                end
            end
        end
    })
end

Usage example:

local h = HashTable(
    function(t) return t.a or 0 end, --Hash
    function(t1, t2) return t1.a == t2.a end) --Equals
h[{a=1}] = 1
print(h[{a=1}]) -- 1
h[{a=1}] = 2
print(h[{a=1}]) -- 2
print(h[{a=1,b=2}]) -- 2 because Hash/Equals only look at 'a'

Naturally, you'll want to get better Hash/Equals functions.

So long as the hashes of your keys rarely collide, this performance of this class should be O(1).

(Note: I'd have put the top half of this answer as a comment to kikito, but I don't have the reputation at this time to do so.)

like image 28
chess123mate Avatar answered Nov 14 '22 16:11

chess123mate


This is not possible in Lua. If you use tables as keys, the key is that specific "instance" of the table. Even if you make a different table with the same contents, the instance is different, therefore it is a different key.

If you want to do something like this, you can create a kind of hash function, which traverses the table to serve as a key (maybe even recursively if needed) and construct a string representation of the table content. It does not need to be human-readable, as long as it is different for different content and equal for tables with the same content. Apart from using pairs() to traverse the table, you would also need to insert the keys into a table and sort them using table.sort(), because pairs() returns them in an arbitrary order, and you want the same string for "equal" tables.

Once you have constructed such string, you can use it as a key:

function hash(t) ... end
t = {}
key1 = { a = "a", b = "b" }
t[hash(key1)] = 4
key2 = { a = "a", b = "b" }
print(t[hash(key2)]) -- should print "4" if the hash function works correctly

In my opinion, this all is too complicated for the simple task of indexing, and you may want to re-think your wish for indexing using copies of tables. Why would you want such functionality?

Update

If you only need to work with phrases, I think that concatenating them is easier than creating such generic hash function. If you need it for sequences of phrases, you won't actually need to iterate through the tables and sort the keys, just collect the main information from each phrase. You would still need to use a helper function, which can create a suitable key for you:

function pkey(...)
    local n, args = select('#', ...), { ... }
    for i=1,n do args[i] = args[i].value end -- extract your info here
    return table.concat(args, ' ') -- space or other separator, such as ':'          
end
tab[pkey(phrase1, phrase2, phrase3)] = "value"
like image 40
Michal Kottman Avatar answered Nov 14 '22 16:11

Michal Kottman


I don't know a lot about Language Processing, and about the goal you want to reach with your program, but what about collecting token like this : use a nested table structure such has the index table store only tables indexed by first phrase token, then each subtables contains value indexed by second phrase token ... etc ... until you reach a phrase final token, will index an number value corresponding to he occurence of the phrase.

Maybe it will be more clear with a exemple, if you have the two following phrase :

  • I like banana.
  • I like hot chick.

Your index would have the following structure :

index["I"] = {
    ["like"] = {
        ["banana"] = 1,
        ["hot"] = {
            ["chick"] = 1
        }
    }    
}

In that way you can count frenquencies with a single traversal step, and count occurences at the same time you indexing, but as i said before, it depends of what is your goal, and it will imply to re - split you phrase so as to find occurences through your index.

like image 22
Faylixe Avatar answered Nov 14 '22 17:11

Faylixe