Wikipedia gives this example
Identifier Gender Bitmaps
F M
1 Female 1 0
2 Male 0 1
3 Male 0 1
4 Unspecified 0 0
5 Female 1 0
But I do not understand this.
Gender
here, we will get a lot of results if for example, we look for Gender = Male
, which need to be filtered out further (so not very useful). How does a Bitmap improve the situation?A bitmap index is a special kind of database index that uses bitmaps. Bitmap indexes have traditionally been considered to work well for low-cardinality columns, which have a modest number of distinct values, either absolutely, or relative to the number of records that contain the data.
There are 100 or more rows for each distinct value in the indexed column. When this limit is met, the bitmap index will be much smaller than a regular index, and you will be able to create the index much faster than a regular index. An example would be one million distinct values in a multi-billion row table.
Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.
Disadvantages. To perform DML operations, bitmap indexing takes time to perform the transaction and update its bitmap index. A large number of records can cause overhead in maintaining it. It has to be modified throughout if the user enters a new record which is time-consuming and difficult.
A better representation of a bitmap index, is if given the sample above:
Identifier Gender RowID
1 Female R1
2 Male R2
3 Male R3
4 Unspecified R4
5 Female R5
the a bitmap index on the gender column would (conceptually) look like this:
Gender R1 R2 R3 R4 R5
Female 1 0 0 0 1
Male 0 1 1 0 0
Unspecified 0 0 0 1 0
Bitmap indexes are used when the number of distinct values in a column is relatively low (consider the opposite where all values are unique: the bitmap index would be as wide as every row, and as long making it kind of like one big identity matrix.)
So with this index in place a query like
SELECT * FROM table1 WHERE gender = 'Male'
the database looks for a match in the gender values in the index, finds all the rowids where the bit was set to 1, and then goes and gets the table results.
A query like:
SELECT * FROM table1 WHERE gender IN ('Male', 'Unspecified')
would get the 1 bits for Male, the 1 bits for Unspecified, do a bitwise-OR then go get the rows where the resulting bits are 1.
So, the advantages of using a bitmap index over a b*tree index are storage (with low cardinality, bitmap indexes are pretty compact), and the ability to do bitwise operations before resolving the actual rowids which can be pretty quick.
Note that bitmap indexes can have performance implications with inserts/deletes (conceptually, you add/remove a column to/from the bitmap and rejig it accordingly...), and can create a whole lot of contention as an update on a row can lock the entire corresponding bitmap entry and you can't update a different row (with the same bitmap value) until the first update is committed/rolled back.
The benefit comes when filtering on multiple columns, then the corresponding indexes can be merged with bitwise operations before actually selecting the data. If you have gender, eye_colour, hair_colour then the query
select * from persons where
gender = 'male' and
(eye_colour = 'blue' or hair_colour = 'blonde')
would first make a bitwise or between the eye_colour['blue'] index and the hair_colour['blonde'] index and finally bitwise and between the result and the gender['male'] index. This operation performs really fast both computationally and I/O.
The resulting bit stream would be used for picking the actual rows.
Bitmap indexes are typically used in "star joins" in data warehouse applications.
As indicated in the Wikipedia article, they use bitwise operations, which can perform better than comparing data types such as integers, so the short answer is increased speed of queries.
Theoretically, it should take up less computations and less time to select all males or all females from your example.
Just thinking about how this works under the hood should make why this is faster obvious. A bit is logically either true or false. If you want to do a query using a WHERE clause, this will eventually evaluate to either a true or a false for the records in order to determine whether to include them in your results.
Preface - the rest of this is meant to be layman's terns and non-techie
So the next question is what does it take to evaluate to true? Even comparing numeric values means that the computer has to...
repeat if you're using a multiple part where clause such as Where "this = this AND that = that"
But using bitwise logic, you're just looking at 0 (false) and 1 (true) values. 90% of the overhead for the comparison work is eliminated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With