I'm new to Lua, and I'm learning the usage of table these days. From tutorials I know that Lua treats numeric indexed items and non-numeric indexed items differently, so I did some tests myself, and today I found an interesting phenomenon and I can't explain it:
The code
t = {1, 2, 3, a='a', b='b'}
print(#t)
gets
3
because the #
operator counts numeric indexed items only. Then I tested the following code
t = {1, 2, 3, a='a', b='b'}
print(#t)
for i = 100,200 do
t[i] = i
end
print(#t)
I get
3
3
Till now I think Lua treats those discontinuous items added later as non-numeric indexed ones. However, after I change the code a little bit
t = {1, 2, 3, a='a', b='b'}
print(#t)
for i = 100,300 do
t[i] = i
end
print(#t)
I get
3
300
I'm confused by this phenomenon, does anyone know the reason? Thanks.
(This phenomenon can be reproduced at http://www.lua.org/cgi-bin/demo)
I tried this code
t = {1, 2, 3, a='a', b='b'}
print(#t)
for i = 100,300 do
t[i] = i
print("add", i, #t)
end
for i = 100,300 do
t[i] = nil
print("del", i, #t)
end
I get
3
add 100 3
add 101 3
add 102 3
...
add 223 3
add 224 3
add 225 3
add 226 226
add 227 227
add 228 228
...
add 298 298
add 299 299
add 300 300
del 100 300
del 101 300
del 102 300
...
del 253 300
del 254 300
del 255 300
del 256 3
del 257 3
del 258 3
...
del 298 3
del 299 3
del 300 3
This example shows that Lua does convert a table between sparse and dense.
I haven't looked at how the #
operator is implemented, but I bet what's going on is that by adding the extra 100 indexes, you've caused the range 1-300
to become dense enough that the indexes 100-300
end up in the "array" part of the table implementation instead of the "hash" part.
Update:
Ok, I looked at the source for the primitive table length. If the final entry in the array part is nil, it binary-searches the array to find the lowest "boundary" (a non-nil index followed by a nil index). If it's not nil, it decides the boundary must be in the hash and searches for it.
So with the table containing numeric indexes {1, 2, 3, 100..200}
, I assume it's not dense enough and the array part only contains {1, 2, 3}
. But with the table containing {1, 2, 3, 100..300}
, it's presumably dense enough that the array part ends somewhere within the 100..300
part (I think the array part is always a power of 2, so it can't possibly end at 300
, but I'm not 100% positive).
Update 2:
When a lua table is rehashed, it counts the number of integer keys. It then walks up all the powers of two that are no more than twice the number of integral keys, and finds the largest power of two that is at least 50% dense (meaning that if the array part were this large, at least 50% of all values would be non-nil).
So with {1, 2, 3, 100..200}
, it walks up
1: 100% dense; good
2: 100% dense; good
4: 75% dense; bad
8: 37.5% dense; bad
16: 18.75% dense; bad
32: 9.375% dense; bad
64: 4.6875% dense; bad
128: 25% dense; bad
256: 40.625% dense; bad
The best good value is 2, so it ends up with an array size of 2. Since 2
is non-nil, it searches the hash for the boundary and finds 3
.
Once you add 201..300
the last step becomes
256: 62.5% dense; good
which causes the array part to cover 1..256
, and since 256
is non-nil, it again searches for the boundary in the hash and gets 300`.
In the end, Lua 5.2 defines a "sequence" as a table with exclusively integral keys starting at 1 and going up with no holes. And it defines #
as only being valid for sequences. This way Lua can get away with the weird behavior you noticed for tables that have holes in their integral sequences.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With