I'm currently using OpenCV's ORB features extractor and I did notice the strange (at least for me) way the ORB-descriptor is stored (it is basically a BRIEF-32 with a modification that is not relevant to my question). As some of you know ORB takes the keypoints extracted using a modified FAST-9 (circle radius = 9 pixels; also stores orientation of the keypoint) and uses those with a modified BRIEF-32 descriptor to store the feature that the keypoint represents.
BRIEF (ORB version) works as follows: we take a 31x31 pixels patch (represents a feature) and create a bunch of random 5x5 pixels test points. Then we take pairs of those points and evaluate their intensities resulting in a binary decision (0 or 1) based on whether the intensity of the first point in the pair is greater or less equal than the intensity of the second one. Then we take all those bits and use a basic sum-formula to build a binary string of length n (for BRIEF-32 we have 32 bytes * 8 = 256 bit long binary string):
SUM(2(i-1)*bit_pair_test)
where the bit_pair_test is the bit value that we've calculated from the test for a pair of test points. The final result is something like (for a set of binary tests (...,0,1,0,1,1)):
(20*1) + (21*1) + (22*0) + (23*1) + (24*0) + ...
Now the way OpenCV's ORB stores those bitstrings is what is a mystery to me. If we look at the matrix that contains the descriptors for a whole image with each row being a single descriptor for a single keypoint we can see that each descriptor has 32 8-bit numbers, which altogether results in those 256 bits that BRIEF-32 uses to store the information. I don't understand why we split those 256 bits in 32 bytes. The official documentation (http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_brief/py_brief.html) only says that OpenCV stores such descriptors in bytes, however it does not explain why it does that. I've considered three possibilities without excluding the possibility that some combination of those might be the answer:
PS: You can read the paper on ORB here and the paper on BRIEF here.
ORB uses BRIEF descriptors but as the BRIEF performs poorly with rotation. So what ORB does is to rotate the BRIEF according to the orientation of keypoints. Using the orientation of the patch, its rotation matrix is found and rotates the BRIEF to get the rotated version.
A keypoint is calculated by considering an area of certain pixel intensities around it. Keypoints are calculated using various different algorithms, ORB(Oriented FAST and Rotated BRIEF) technique uses the FAST algorithm to calculate the keypoints. FAST stands for Features from Accelerated Segments Test.
ORB is basically a fusion of FAST keypoint detector and BRIEF descriptor with many modifications to enhance the performance. First it use FAST to find keypoints, then apply Harris corner measure to find top N points among them. It also use pyramid to produce multiscale-features.
Image Keypoint: a distinctive image feature that can be detected repeatably despite changes in 1) image scale (resolution), 2) illumination and noise, and 3) image orientation and perspective.
The choice of 8 and 32 bits pattern is due to storage and efficiency issues.
The bit order in BRIEF, ORB and BRISK is not relevant (unlike FREAK). Thus, all bits of these descriptors are of equal significance, and you can't just compare the first part of the bitstream, etc.
On the other had, FREAK was designed with such a matching process (called a cascade in FREAK's paper) in mind.
Well, computers don't store individual bits. Thus, you won't see anybody storing BRIEF and the like in bit arrays.
The smallest component that can be read from memory is a byte (corresponding usually to 8 bits, although some DSPs cannot read chunks that are smaller than 16 bits, but that's another story).
Thus, you could see people storing their descriptors in byte arrays (type unsigned char
in C/C++, which is the underlying OpenCV implementation language).
Plus, memory accesses are usually better (faster) when the variables are aligned on CPU word boundaries. Most CPU's nowadays have words of 32 or 64 bits, 32 bits words being the better choice because 64 bits architectures were designed with the legacy 32-bits processors in mind.
Hamming distance is computed via an XOR operation. It happens that many processors have dedicated instruction sets that can compute XOR efficiently with 32 bits words (the common size of an integer before 64 bits CPUs became more common). Even more, they may also support computing several XOR values on multiple 32 bits words in parallel, which is a parallelism technique called SIMD (Single Input Multiple Data). For example, SSE extensions can be leveraged to further accelerate the Hamming distance computation of BRIEF/ORB/... descriptors whose size is a multiple of 32 bits.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With