Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What data type to use in a bit array [closed]

When making a bit array or also called a bit set in C what data type should I use? Should I use an array of int? unsigned int? size_t? uint8_t? uintmax_t?

For me, using signed integer types is a no-no as noted by other answers here in SO about signed integer left and right shifts (I lost the link to the answer).

Now, should I use the smallest available unsigned integer or the biggest? Which one has the best performance?

Some of my thoughts: the majority of the bit arrays here in SO use an array of char or uint8_t, but I can't see how that would be better than using uintmax_t (also because I haven't seen any arguments for why it is the way to go, hence this question). When making certain operations like union and intersection between two bit arrays the loop iterates fewer times when using a bigger unsigned integer.

Edit: upon seeing some answers I think some people got confused to what I'm asking. Sorry about that. What I'm trying to do is to make a bit array where each bit can be individually accessed or set to either 1 or 0. I know I could just use a bool array but that is not space efficient. You could say that nowadays we have RAMs that are big enough and the gain of bit arrays over boolean arrays is minimal but that's not the point here. What I'm trying to know is, given a bit array where each bit can be modified or accessed by a bit_index (which is different from the array's index) what data type should be my array?

like image 830
LeoVen Avatar asked Mar 05 '23 20:03

LeoVen


2 Answers

I would personally use size_t. For most (not all, but probably all of the ones that you care about) platforms, it is the same size as your CPU registers, which means that for many operations that need to scan the entire bit vector it achieves the maximum number of bits processed per loop iteration (e.g. finding set bits, counting bits, etc). For such algorithms, CPU builtins like bsf (bit scan forward) and clz (count leading zeros) will significantly speed up your algorithm.

Just for context, the Linux kernel uses unsigned long for bit vectors, which AFAIK is the same as size_t on all Linux ABIs, but is not on Windows (at least not with MSVC) where long is 32 bits even on x64.

like image 120
Andrew Sun Avatar answered Mar 09 '23 22:03

Andrew Sun


This depends upon how many bits you need to keep track of, the efficiency of accessing a single bit, and how much memory you want to consume in keeping track of all these bits.

There are so many ways of doing this without further details its hard to answer.

What I've seen is a simple array of uint32_t to keep it packed and decent access speeds. Then when accessing a bit, lets say bit 128, this would be bit 0 of the 4th uint32_t of the array.

like image 40
fdk1342 Avatar answered Mar 09 '23 20:03

fdk1342