Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the most efficient bit vector compression method for my use case?


I'm working on a project in computational biology and I need to store an index of locuses that differ between many sequences. For now, I'm using a B+Tree for that purpose, but I guess using a bitmap index would be way faster for such a use case : only a small number of locus differ between two sequences, 1% on average, and they are nearly equally distributed along the sequence; so it seems like there is a lot of room for bitmap index compression. My problem is that I cannot manage to find a compression method that can efficiently:

  • allow fast individual bit setting/unsetting
  • permit efficient range queries over the bitmap
  • possibly allow fast XOR-ing/AND-ing of two indexes

Thx in advance for your suggestions.

like image 782
fokenrute Avatar asked Jan 22 '11 15:01

fokenrute


People also ask

What is bit compression?

In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy.

How does bitpacking work?

The first, and simplest, bit-pack is to simply adopt a bit-wise format where you have a 1-bit header followed by a known number of bits representing the value. The bit header works as follows: If it is set (1), then the value following it is encoded using 16 bits.


1 Answers

Check out FastBit:

https://sdm.lbl.gov/fastbit/

like image 66
Noah Watkins Avatar answered Sep 20 '22 14:09

Noah Watkins