Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hashtable with 64-bit integers

I am looking to implement an O(1) lookup for binary quadkeys

I know that there is no true hashtable for javascript, instead using objects and o(1) lookup with their properties, but the issue there is that the keys are always converted to strings.

I suspect to have over a 10 million entries in memory, and if I have to rely on keys being strings and with the average quadkey string being 11.5 characters that would equate to (10 million entries * 11.5 length * 2 bytes) = 230,000,000 bytes or 230 MB.

Compared to storing as int64 (10 million entries * 8 bytes) = 80,000,000 bytes or 80 MB

I know that int64 isn't supported natively with javascript, but there are libraries that can assist with that in terms of doing the bitwise operations i'd want.

Now, even though there exists libraries that can deal with int64, they are ultimately objects that are not truly representing 8 bytes, so I believe I cannot use any int64 library in a hashtable, instead I was considering using 2-deep hash table with two int32. The first key would be the first 4 bytes, and the 2nd key being the last 4 bytes. It's not as ideal as 1 operation lookup to find a value, but 2 operations which is still good enough.

However, I feel this is not worth it if all keys are stored as strings, or the fact that all numbers are double-precision floating-point format numbers (8 bytes) so it each hash entry would then take up 16 bytes (two "int32" numbers).

My questions are:

1. If you store a number as the key to a property, will it take up 8 bytes of memory or will it convert to string and take up length*2bytes?

  1. Is there a way to encode binary quadkeys into the double-precision floating-point format number standard that javascript has adopted, even if the number makes no sense , the underlying bits serve a purpose (unique identification of a construct).

PS: I am marking this with nodejs as there may exist libraries that could assist in my end goal


Edit 1:

Seems 1 is possible with Map() and node 0.12.x+

As far as number 2 I was able to use a int64 lib (bytebuffer) and convert a 64int to a buffer.

I wanted to just use the buffer as the key to Map() but it would not let me as the buffer was internally an object, which each instance acts as a new key to Map().

So I considered turning the buffer back into native type, a 64bit double.

Using readDoubleBE I read the buffer as a double, which represents my 64int binary and successfully lets me use it in a map and allows for O(1) lookup.

var ByteBuffer = require("bytebuffer");

var lookup = new Map();


var startStr = "123123738232777";
for(var i = 1; i <= 100000; i++) {
    var longStr = startStr + i.toString();
    var longVal = new ByteBuffer.Long.fromValue(longStr);

    var buffer = new ByteBuffer().writeUint64(longVal).flip().toBuffer();
    var doubleRepresentation = buffer.readDoubleBE();
    lookup.set(doubleRepresentation, longStr);
}

console.log(exists("12312373823277744234")); // true
console.log(exists("123123738232777442341232332322")); // false


function exists(longStr) {
    var longVal = new ByteBuffer.Long.fromValue(longStr);
    var doubleRepresentation = new ByteBuffer().writeUint64(longVal).flip().toBuffer().readDoubleBE();
    return lookup.has(doubleRepresentation);
}

The code is sloppy and there are probably shortcuts I am missing, so any suggestions/hints are welcomed.

Ideally I wanted to take advantage of bytebuffer's varints so that I can have even more savings with memory but I wasn't sure if that is possible in a Map, because I wasn't able to use a buffer as a key.


Edit 2:

Using memwatch-next I was able to see that max heap size was 497962856 bytes with this method with Set() whereas using a string in Set() was 1111082864 bytes. That is about 500MB vs 1GB, which is a far cry from 80mb and 230mb, not sure where the extra memory use is coming from. It's worth nothing for these memory tests I used Set over Map that way it should only store unique keys in the data structure. (Using Set as just a existence checker, where Map would serve as a lookup)

like image 791
ParoX Avatar asked Nov 05 '15 15:11

ParoX


1 Answers

Because your keys are integers (and are unique) you could just use them as array indices. However, JS arrays are limited to contain max entries bounded by 32 or 64 bit integer depending on your platform.

To overcome this, you could use your two-step approach, but without using objects and their string hashes. You can store it something like this store[0010][1000] = 'VALUE' - the item with binary key 00101000 is stored under index 0010 in first array and index 1000 in child array

In decimal, you're dealing with store[2][8] = 'VALUE', which is equivalent to doing store[40] = 'VALUE' in 64bit space

You get yourself a tree with all the properties you want:

  • You can easily look up by key (in two steps)
  • Your keys are integers
  • You're dealing with 32bit integers (or less, depending on how you chunk it)
like image 61
Andrei R Avatar answered Oct 05 '22 16:10

Andrei R