I want to store a lua table where the keys are other lua tables. I know that this is possible BUT I want to be able to do look ups in the table using copies of those tables. Specifically, I want to be able to do:
t = {}
key = { a = "a" }
t[key] = 4
key2 = { a = "a" }
and then I want to be able to look up:
t[key2]
and get 4.
I know that I can turn key
into a string and put it into table t
. I've also thought about writing a custom hash function or doing this by nesting tables. Is there a best way for me to get this type of functionality? What other options do I have?
In Lua, two tables created separately are considered "different". But if you create a table once, you can assign it to any variables you want, and when you compare them, Lua will tell you that they are equal. In other words:
t = {}
key = { a = "a" }
t[key] = 4
key2 = key
...
t[key2] -- returns 4
So, that's the simple, clean way of doing what you want. Store key
somewhere, so you can retrieve the 4
back by using it. This is also very fast.
If you really don't want to do that ... well, there is a way. But it is kindof inefficient and ugly.
The first part is making a function that compares two separate tables. It should return true if two tables are "equivalent", and false if they are not. Let's call it equivalent. It should work like this:
equivalent({a=1},{a=1}) -- true
equivalent({a=1,b=2}, {a=1}) -- false
equivalent({a={b=1}}, {a={b=2}}) -- false
The function must be recursive, to handle tables that contain tables themselves. It also must not be fooled if one of the tables "contains" the other, but has more elements. I came out with this implementation; probably there are better ones out there.
local function equivalent(a,b)
if type(a) ~= 'table' then return a == b end
local counta, countb = 0, 0
for k,va in pairs(a) do
if not equivalent(va, b[k]) then return false end
counta = counta + 1
end
for _,_ in pairs(b) do countb = countb + 1 end
return counta == countb
end
I'm not going to explain that function here. I hope it is clear enough what it does.
The other part of the puzzle consist on making t
use the equivalent
function when comparing keys. This can be done with careful metatable manipulation, and an extra "storage" table.
We basically transform t
into an impostor. When our code tells it to store a value under a key, it doesn't save it in itself; instead it gives it to the extra table (we'll call that store
). When the code asks t
for a value, it searches for it in store
, but using the equivalent
function to get it.
This is the code:
local function equivalent(a,b)
... -- same code as before
end
local store = {} -- this is the table that stores the values
t = setmetatable({}, {
__newindex = store,
__index = function(tbl, key)
for k,v in pairs(store) do
if equivalent(k,key) then return v end
end
end
})
Usage example:
t[{a = 1}] = 4
print(t[{a = 1}]) -- 4
print(t[{a = 1, b = 2}]) -- nil
kikito's answer is good, but has some flaws:
t[{a=1}] = true
two times, store
will contain two tables (leaking memory for the lifetime of the hash table)(Also note that kikito's "equivalent" function will cause an infinite loop if any table has a circular reference.)
If you never need to change/remove any information in the table, then kikito's answer will suffice as it stands. Otherwise, the metatable must be changed so that the __newindex makes sure that the table doesn't already exist:
t = setmetatable({}, {
__newindex = function(tbl, key, value)
for k,v in pairs(store) do
if equivalent(k,key) then
tbl[k] = value
return
end
end
store[key] = value
end,
__index = function(tbl, key)
for k,v in pairs(store) do
if equivalent(k, key) then return v end
end
end
})
As you've suggested, a completely different option is to write a custom hashing function. Here's a HashTable that can make use of that:
local function HashTable(Hash, Equals)
--Hash is an optional function that takes in any key and returns a key that lua can use (string or number). If you return false/nil, it will be assumed that you don't know how to hash that value.
-- If Hash is not provided, table-keys should have a GetHash function or a .Hash field
--Equals is an optional function that takes two keys and specify whether they are equal or not. This will be used when the same hash is returned from two keys.
-- If Equals is not provided, items should have a Equals function; items are in this case assumed to not be equal if they are different types.
local items = {} --Dict<hash, Dict<key, value>>
local function GetHash(item)
return Hash and Hash(item) or type(item) == "table" and (item.GetHash and item:GetHash() or item.Hash) or item
end
local function GetEquals(item1, item2)
if Equals then return Equals(item1, item2) end
local t1, t2 = type(item1), type(item2)
if t1 ~= t2 then return false end
if t1 == "table" and item1.Equals then
return item1:Equals(item2)
elseif t2 == "table" and item2.Equals then
return item2:Equals(item1)
end
return false
end
return setmetatable({}, {
__newindex = function(_, key, value)
local hash = GetHash(key)
local dict = items[hash]
if not dict then
if value ~= nil then --Only generate a table if it will be non-empty after assignment
items[hash] = {[key] = value}
end
return
end
for k, v in pairs(dict) do
if GetEquals(key, k) then --Found the entry; update it
dict[k] = value
if value == nil then --check to see if dict is empty
if next(dict) == nil then
items[hash] = nil
end
end
return
end
end
--This is a unique entry
dict[key] = value
end,
__index = function(_, key)
local hash = GetHash(key)
local dict = items[hash]
if not dict then return nil end
for k, v in pairs(dict) do
if GetEquals(key, k) then
return v
end
end
end
})
end
Usage example:
local h = HashTable(
function(t) return t.a or 0 end, --Hash
function(t1, t2) return t1.a == t2.a end) --Equals
h[{a=1}] = 1
print(h[{a=1}]) -- 1
h[{a=1}] = 2
print(h[{a=1}]) -- 2
print(h[{a=1,b=2}]) -- 2 because Hash/Equals only look at 'a'
Naturally, you'll want to get better Hash/Equals functions.
So long as the hashes of your keys rarely collide, this performance of this class should be O(1).
(Note: I'd have put the top half of this answer as a comment to kikito, but I don't have the reputation at this time to do so.)
This is not possible in Lua. If you use tables as keys, the key is that specific "instance" of the table. Even if you make a different table with the same contents, the instance is different, therefore it is a different key.
If you want to do something like this, you can create a kind of hash function, which traverses the table to serve as a key (maybe even recursively if needed) and construct a string representation of the table content. It does not need to be human-readable, as long as it is different for different content and equal for tables with the same content. Apart from using pairs()
to traverse the table, you would also need to insert the keys into a table and sort them using table.sort()
, because pairs()
returns them in an arbitrary order, and you want the same string for "equal" tables.
Once you have constructed such string, you can use it as a key:
function hash(t) ... end
t = {}
key1 = { a = "a", b = "b" }
t[hash(key1)] = 4
key2 = { a = "a", b = "b" }
print(t[hash(key2)]) -- should print "4" if the hash function works correctly
In my opinion, this all is too complicated for the simple task of indexing, and you may want to re-think your wish for indexing using copies of tables. Why would you want such functionality?
Update
If you only need to work with phrases, I think that concatenating them is easier than creating such generic hash function. If you need it for sequences of phrases, you won't actually need to iterate through the tables and sort the keys, just collect the main information from each phrase. You would still need to use a helper function, which can create a suitable key for you:
function pkey(...)
local n, args = select('#', ...), { ... }
for i=1,n do args[i] = args[i].value end -- extract your info here
return table.concat(args, ' ') -- space or other separator, such as ':'
end
tab[pkey(phrase1, phrase2, phrase3)] = "value"
I don't know a lot about Language Processing, and about the goal you want to reach with your program, but what about collecting token like this : use a nested table structure such has the index table store only tables indexed by first phrase token, then each subtables contains value indexed by second phrase token ... etc ... until you reach a phrase final token, will index an number value corresponding to he occurence of the phrase.
Maybe it will be more clear with a exemple, if you have the two following phrase :
Your index would have the following structure :
index["I"] = {
["like"] = {
["banana"] = 1,
["hot"] = {
["chick"] = 1
}
}
}
In that way you can count frenquencies with a single traversal step, and count occurences at the same time you indexing, but as i said before, it depends of what is your goal, and it will imply to re - split you phrase so as to find occurences through your index.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With