Efficiently find least significant set bit in a large array?

Tags:

I have a huge memory block (bit-vector) with size N bits within one memory page, consider N on average is 5000, i.e. 5k bits to store some flags information.
At a certain points in time (super-frequent - critical) I need to find the first bit set in this whole big bit-vector. Now I do it per-64-word, i.e. with help of __builtin_ctzll). But when N grows and search algorithm cannot be improved, there can be some possibility to scale this search through the expansion of memory access width. This is the main problem in a few words

There is a single assembly instruction called BSF that gives the position of the highest set bit (GCC's __builtin_ctzll()). So in x86-64 arch I can find the highest bit set cheaply in 64-bit words.

But what about scaling through memory width?
E.g. is there a way to do it efficiently with 128 / 256 / 512 -bit registers?
Basically I'm interested in some C API function to achieve this, but also want to know what this method is based on.

UPD: As for CPU, I'm interested for this optimization to support the following CPU lineups:
Intel Xeon E3-12XX, Intel Xeon E5-22XX/26XX/E56XX, Intel Core i3-5XX/4XXX/8XXX, Intel Core i5-7XX, Intel Celeron G18XX/G49XX (optional for Intel Atom N2600, Intel Celeron N2807, Cortex-A53/72)

P.S. In mentioned algorithm before the final bit scan I need to sum k (in average 20-40) N-bit vectors with CPU AND (the AND result is just a preparatory stage for the bit-scan). This is also desirable to do with memory width scaling (i.e. more efficiently than per 64bit-word AND)

red0ct

1 Answers

This answer is in a different vein, but if you know in advance that you're going to be maintaining a collection of B bits and need to be able to efficiently set and clear bits while also figuring out which bit is the first bit set, you may want to use a data structure like a van Emde Boas tree or a y-fast trie. These data structures are designed to store integers in a small range, so instead of setting or clearing individual bits, you could add or remove the index of the bit you want to set/clear. They're quite fast - you can add or remove items in time O(log log B), and they let you find the smallest item in time O(1). Figure that if B ≈ 50000, then log log B is about 4.

I'm aware this doesn't directly address how to find the highest bit set in a huge bitvector. If your setup is such that you have to work with bitvectors, the other answers might be more helpful. But if you have the option to reframe the problem in a way that doesn't involve bitvector searching, these other data structures might be a better fit.

answered Oct 09 '22 11:10

templatetypedef

Related questions
                            
                                Why "initializer element is not a constant" is... not working anymore?
                            
                                How do I exec() a process in the background in C?
                            
                                Yoda Conditions and integer promotion
                            
                                Sending file descriptor over UNIX domain socket, and select()
                            
                                How to use GDI+ in C?
                            
                                C linux pthread thread priority
                            
                                How can I create a function that takes a parameter of unknown type in C?
                            
                                Finding the most common three-item sequence in a very large file
                            
                                Useless allocated Stackspace?
                            
                                How big is the stack memory for a certain program, and are there any compiler flags that can set it?
                            
                                Is an (empty) infinite loop undefined behavior in C?
                            
                                Should the FUSE getattr operation always be serialised?
                            
                                how to disable gcc warning "cc1: warning: command line option ‘-std=c++11’ is valid for C++/ObjC++ but not for C [enabled by default]"
                            
                                Why does the Linux Kernel use the data structures that it does?
                            
                                Difference between double pointer and array of pointers
                            
                                Time left until Windows suspend
                            
                                c preprocessor passing multiple arguments as one
                            
                                Check in C++ that a struct is well aligned or contains gaps
                            
                                Compiler: What if condition is always true / false
                            
                                Throwing C++ exception through C function call

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Efficiently find least significant set bit in a large array?

Tags:

c

bit-manipulation

assembly

x86-64

avx

red0ct

People also ask

1 Answers

templatetypedef

Recent Activity

Donate For Us