I am looking to implement an O(1) lookup for binary quadkeys
I know that there is no true hashtable for javascript, instead using objects and o(1) lookup with their properties, but the issue there is that the keys are always converted to strings.
I suspect to have over a 10 million entries in memory, and if I have to rely on keys being strings and with the average quadkey string being 11.5 characters that would equate to (10 million entries * 11.5 length * 2 bytes) = 230,000,000 bytes or 230 MB.
Compared to storing as int64 (10 million entries * 8 bytes) = 80,000,000 bytes or 80 MB
I know that int64 isn't supported natively with javascript, but there are libraries that can assist with that in terms of doing the bitwise operations i'd want.
Now, even though there exists libraries that can deal with int64, they are ultimately objects that are not truly representing 8 bytes, so I believe I cannot use any int64 library in a hashtable, instead I was considering using 2-deep hash table with two int32. The first key would be the first 4 bytes, and the 2nd key being the last 4 bytes. It's not as ideal as 1 operation lookup to find a value, but 2 operations which is still good enough.
However, I feel this is not worth it if all keys are stored as strings, or the fact that all numbers are double-precision floating-point format numbers (8 bytes) so it each hash entry would then take up 16 bytes (two "int32" numbers).
My questions are:
1. If you store a number as the key to a property, will it take up 8
bytes of memory or will it convert to string and take up
length*2bytes?
PS: I am marking this with nodejs as there may exist libraries that could assist in my end goal
Edit 1:
Seems 1 is possible with Map()
and node 0.12.x+
As far as number 2 I was able to use a int64 lib (bytebuffer) and convert a 64int to a buffer.
I wanted to just use the buffer as the key to Map() but it would not let me as the buffer was internally an object, which each instance acts as a new key to Map().
So I considered turning the buffer back into native type, a 64bit double.
Using readDoubleBE
I read the buffer as a double, which represents my 64int binary and successfully lets me use it in a map and allows for O(1) lookup.
var ByteBuffer = require("bytebuffer");
var lookup = new Map();
var startStr = "123123738232777";
for(var i = 1; i <= 100000; i++) {
var longStr = startStr + i.toString();
var longVal = new ByteBuffer.Long.fromValue(longStr);
var buffer = new ByteBuffer().writeUint64(longVal).flip().toBuffer();
var doubleRepresentation = buffer.readDoubleBE();
lookup.set(doubleRepresentation, longStr);
}
console.log(exists("12312373823277744234")); // true
console.log(exists("123123738232777442341232332322")); // false
function exists(longStr) {
var longVal = new ByteBuffer.Long.fromValue(longStr);
var doubleRepresentation = new ByteBuffer().writeUint64(longVal).flip().toBuffer().readDoubleBE();
return lookup.has(doubleRepresentation);
}
The code is sloppy and there are probably shortcuts I am missing, so any suggestions/hints are welcomed.
Ideally I wanted to take advantage of bytebuffer's varints so that I can have even more savings with memory but I wasn't sure if that is possible in a Map, because I wasn't able to use a buffer as a key.
Edit 2:
Using memwatch-next I was able to see that max heap size was 497962856 bytes with this method with Set() whereas using a string in Set() was 1111082864 bytes. That is about 500MB vs 1GB, which is a far cry from 80mb and 230mb, not sure where the extra memory use is coming from. It's worth nothing for these memory tests I used Set
over Map
that way it should only store unique keys in the data structure. (Using Set as just a existence checker, where Map would serve as a lookup)
Because your keys are integers (and are unique) you could just use them as array indices. However, JS arrays are limited to contain max entries bounded by 32 or 64 bit integer depending on your platform.
To overcome this, you could use your two-step approach, but without using objects and their string hashes. You can store it something like this
store[0010][1000] = 'VALUE'
- the item with binary key 00101000
is stored under index 0010
in first array and index 1000
in child array
In decimal, you're dealing with store[2][8] = 'VALUE'
, which is equivalent to doing store[40] = 'VALUE'
in 64bit space
You get yourself a tree with all the properties you want:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With