I have 2 2D points which are jammed together into an array: int square[4]
. These four numbers are interpreted as the definition of a rectangle with horizontal lines parallel to the X-axis and vertical lines parallel to the Y-axis. The elements of the array then respectively define:
I have defined the a winding order in this enum
:
enum WindingOrder {
BOTTOM = 0,
RIGHT,
TOP,
LEFT
};
The minimal, complete, verifiable example of my code, is that I am given an output second array: int output[4]
and an input WindingOrder edge
. I need to populate output
as follows:
switch(edge) {
case BOTTOM:
output[0] = square[0]; output[1] = square[1]; output[2] = square[2]; output[3] = square[1];
break;
case RIGHT:
output[0] = square[2]; output[1] = square[1]; output[2] = square[2]; output[3] = square[3];
break;
case TOP:
output[0] = square[2]; output[1] = square[3]; output[2] = square[0]; output[3] = square[3];
break;
case LEFT:
output[0] = square[0]; output[1] = square[3]; output[2] = square[0]; output[3] = square[1];
break;
}
I'm not married to a particular WindingOrder
arrangement, nor do I care about the order of the points in ouptut
, so if changing those makes this solvable I'm down. What I want to know is can I construct the square
indexes to assign to output
in a for
loop, without an if
/case
/ternary statement (in other words using bit-wise operations)?
So I'd want, given int i = 0
and WindingOrder edge
to do bit-wise operations on them to find:
do {
output[i] = array[???];
} while(++i <= LEFT);
EDIT:
I've received a lot of static array answers (which I believe are the best way to solve this so I've given a +1). But as a logic problem I'm curious how few bit-wise operations could be taken to find an element of a given edge dynamically. So for example, how should this function's body be writen given an arbitrary edge
and i
: int getIndex(int i, int edge)
Here is a different solution. It is a variation on the static array approach, but without an actual array: the indexing matrix is inlined as a 32 bit unsigned integer computed as constant expression. The column for the edge
parameter is selected with a single shift, finally, individual indices for each array element are selected with via simple bit-shifting and masking.
This solution has some advantages:
This is as close as I could get to a bitwise solution.
#include <iostream>
enum WindingOrder { BOTTOM = 0, RIGHT, TOP, LEFT };
void BitwiseWind(int const *input, int *output, enum WindingOrder edge)
{
unsigned bits = ((0x00010201 << BOTTOM * 2) |
(0x02010203 << RIGHT * 2) |
(0x02030003 << TOP * 2) |
(0x00030001 << LEFT * 2))
>> (edge * 2);
output[0] = input[(bits >> 24) & 3];
output[1] = input[(bits >> 16) & 3];
output[2] = input[(bits >> 8) & 3];
output[3] = input[(bits >> 0) & 3];
}
int main() {
enum WindingOrder edges[4] = { BOTTOM, RIGHT, TOP, LEFT };
int rect[4] = { 1, 3, 4, 5 };
int output[4];
for (int i = 0; i < 4; i++) {
BitwiseWind(rect, output, edges[i]);
std::cout << output[0] << output[1] << output[2] << output[3] << std::endl;
}
return 0;
}
Compiling BitwiseWind
for x86-64
with clang -O3
generates 21 instructions, 6 more than the static array version, but without any memory reference. That's a little disappointing, but I hope it could generate fewer instructions for an ARM
target, taking advantage of bit-field extraction opcodes. Incidentally, the inlined version using output[i] = array[(i+(i==winding)*2)&3];
produces 25 instructions without any jumps, and gcc -O3
does much worse: it generates a lot more code with 4 tests and jumps.
The generic getIndex
function below compiles to just 6 x86
instructions:
int getIndex(int i, int edge) {
return (((0x00010201 << BOTTOM * 2) |
(0x02010203 << RIGHT * 2) |
(0x02030003 << TOP * 2) |
(0x00030001 << LEFT * 2))
>> (edge * 2 + 24 - i * 8)) & 3;
}
Is there a particular reason that this needs to use lots of bitwise operations? It seems quite a complex way to solve the problem?
You seem to be quite worried about speed, for example, you don't want to use modulo because it is expensive. This being the case, why not just use a really simple lookup and unroll the loops? Example on ideone as well.
EDIT: Thanks to chqrlie for input. Have updated answer accordingly.
#include <iostream>
using namespace std;
enum WindingOrder {
BOTTOM = 0,
RIGHT,
TOP,
LEFT
};
void DoWinding1(unsigned int const *const in, unsigned int *const out, const enum WindingOrder ord)
{
static const unsigned int order[4][4] = { [BOTTOM] = {0,1,2,1},
[RIGHT] = {2,1,2,3},
[TOP] = {2,3,0,3},
[LEFT] = {0,3,0,1} };
out[0] = in[order[ord][0]];
out[1] = in[order[ord][1]];
out[2] = in[order[ord][2]];
out[3] = in[order[ord][3]];
}
int main() {
unsigned int idx;
unsigned int rect[4] = {1, 3, 4, 5};
unsigned int out[4] = {0};
DoWinding1(rect, out, BOTTOM);
std::cout << out[0] << out[1] << out[2] << out[3] << std::endl;
return 0;
}
Is that possible to redefine WindingOrder's value set? If it could be , here's my solution , which tried encoding selection indexes in WindingOrder's value set , then simply decoding out select index for input[]
by shifting and masking as long the output[]
index iterating.
[Thanks to chqrlie for offering code base]:
#include <iostream>
enum WindingOrder {
// the RIGHT most 4-bits indicate the selection index from input[] to output[0]
// the LEFT most 4-bits indicate the selection index from input[] to output[3]
BOTTOM = 0x1210,
RIGHT = 0x3212,
TOP = 0x3230,
LEFT = 0x3010
};
void BitwiseWind(int const *input, int *output, unsigned short edge)
{
for (size_t i = 0; i < 4; i++)
output[i] = input[(edge >> (i*4)) & 0x000F]; // decode
}
int main() {
enum WindingOrder edges[4] = { BOTTOM, RIGHT, TOP, LEFT };
int rect[4] = { 1, 3, 4, 5 };
int output[4];
for (int i = 0; i < 4; i++) {
BitwiseWind(rect, output, edges[i]);
std::cout << output[0] << output[1] << output[2] << output[3] << std::endl;
}
return 0;
}
The generic getIndex(int i,enum WindingOrder edge) would be:
int getIndex(int i,enum WindingOrder edge)
{
return ((edge >> (i*4)) & 0x000F);
}
I did not count how many instruction it used , but i believe it would be quiet few. And really easy to image how it worked. :)
This is untested and there might be a small mistake in some details but the general idea should work.
Copying the array to the output would use the indices {0,1,2,3}
. To get a specific edge you have to do some transformations to the indices:
changed_pos changed_to
RIGHT : {2,1,2,3} 0 2
TOP : {0,3,2,3} 1 3
LEFT : {0,1,0,3} 2 0
BOTTOM: {0,1,2,1} 3 1
So basically you have to add 2 mod 4
for the specific position of your winding.
So the (like I said untested) snipped could look like this
for (size_t i=0; i<4; ++i) {
output[i] = array[(i+(i==edge)*2)%4];
}
If the comparison is true you add 1*2=2
, else 0*2=0
to the index and do mod 4
to stay in the range.
Your enum
have to look like this (but I guess you figured this out by yourself):
enum WindingOrder {
RIGHT,
TOP,
LEFT,
BOTTOM
};
MWE:
#include <iostream>
#include <string>
#include <vector>
enum WindingOrder {
RIGHT=0,
TOP,
LEFT,
BOTTOM
};
int main()
{
std::vector<int> array = {2,4,8,9};
std::vector<int> output(4);
std::vector<WindingOrder> test = {LEFT,RIGHT,BOTTOM,TOP};
for (auto winding : test) {
for (size_t i=0; i<4; ++i) {
output[i] = array[(i+(i==winding)*2)%4];
}
std::cout << "winding " << winding << ": " << output[0] << output[1] << output[2] << output[3] << std::endl;
}
}
From the answer of yourself, you're close to the solution. I think what you need here is Karnaugh map, which is a universal method for most Boolean algebra problems.
Suppose
The elements of the array then respectively define:
input[0]: Left edge's X coordinate input[0]: Bottom edge's Y coordinate input[0]: Right edge's X coordinate input[0]: Top edge's Y coordinate
I have defined the a winding order in this enum:
enum WindingOrder { BOTTOM = 0, RIGHT, TOP, LEFT };
Since the for-loop may looks like
for (int k = 0; k != 4; ++k) {
int i = getIndex(k, edge); // calculate i from k and edge
output[k] = square[i];
}
Then the input is k
(output[k]
) and edge
, the output is i
(square[i]
). And because i
has 2 bits, then two logic functions are needed.
Here we use P = F1(A, B, C, D)
and Q = F2(A, B, C, D)
to represent the logic functions, in which A
, B
, C
, D
, P
and Q
are all single bit, and
k = (A << 1) + B;
edge = (C << 1) + D;
i = (P << 1) + Q;
Then what we need to do is just deduce the two logic functions F1
and F2
from the given conditions.
From the switch case statements you gave, we can easily get the truth table.
k\edge 0 1 3 2
0 0 2 0 2
1 1 1 3 3
3 1 3 1 3
2 2 2 0 0
Then separate this into two truth table for two bits P
and Q
.
P edge 0 1 3 2
k AB\CD 00 01 11 10
0 00 0 1 0 1
1 01 0 0 1 1
3 11 0 1 0 1
2 10 1 1 0 0
Q edge 0 1 3 2
k AB\CD 00 01 11 10
0 00 0 0 0 0
1 01 1 1 1 1
3 11 1 1 1 1
2 10 0 0 0 0
These are the Karnaugh maps that I mentioned at the beginning. We can easily get the functions.
F1(A, B, C, D) = A~B~C + A~CD + ~B~CD + ~ABC + ~AC~D + BC~D
F2(A, B, C, D) = B
Then the program will be
int getIndex(int k, int edge) {
int A = (k >> 1) & 1;
int B = k & 1;
int C = (edge >> 1) & 1;
int D = edge & 1;
int P = A&~B&~C | A&~C&D | ~B&~C&D | ~A&B&C | ~A&C&~D | B&C&~D;
int Q = B;
return (P << 1) + Q;
}
Passed the examine here. Of course, you can simplify the function even more with the XOR.
EDIT
Using XOR to simplify the expression can be achieved most of time, since A^B == A~B + ~AB
. But this may not the thing you want. First, I think the performance varies only a little between the Sum of Products(SoP) expression and the even more simplified version with XOR. Second, there is not a universal method (as far as I know) to simplify an expression with XOR, so you have to rely on your own experience to do this work.
There are sixteen possible logic functions of two variables, but in digital logic hardware, the simplest gate circuits implement only four of them: AND, OR, and the complements of those (NAND and NOR). And the Karnaugh map are used to simplify real-world logic requirements so that they can be implemented using a minimum number of physical logic gates.
There are two common expressions used here, Sum of Products and Product of Sums expressions. These two expressions can be implemented directly using only AND and OR logic operators. And they can be deduced directly with Karnaugh map.
If you define the coordinates and directions in clockwise order starting at left,
#define LEFT 0
#define TOP 1
#define RIGHT 2
#define BOTTOM 3
you can use
void edge_line(int line[4], const int rect[4], const int edge)
{
line[0] = rect[ edge & 2 ];
line[1] = rect[ ((edge + 3) & 2) + 1 ];
line[2] = rect[ ((edge + 1) & 2) ];
line[3] = rect[ (edge & 2) + 1 ];
}
to copy the edge line coordinates (each line segment in clockwise winding order). It looks suboptimal, but using -O2
, GCC-4.8, you get essentially
edge_line:
pushl %esi
pushl %ebx
movl 20(%esp), %ecx
movl 16(%esp), %edx
movl 12(%esp), %eax
movl %ecx, %esi
andl $2, %esi
movl (%edx,%esi,4), %ebx
movl %ebx, (%eax)
leal 3(%ecx), %ebx
addl $1, %ecx
andl $2, %ebx
andl $2, %ecx
addl $1, %ebx
movl (%edx,%ebx,4), %ebx
movl %ebx, 4(%eax)
movl (%edx,%ecx,4), %ecx
movl %ecx, 8(%eax)
movl 4(%edx,%esi,4), %edx
movl %edx, 12(%eax)
popl %ebx
popl %esi
ret
but on 64-bit, even better
edge_line:
movl %edx, %ecx
andl $2, %ecx
movslq %ecx, %rcx
movl (%rsi,%rcx,4), %eax
movl %eax, (%rdi)
leal 3(%rdx), %eax
addl $1, %edx
andl $2, %edx
andl $2, %eax
movslq %edx, %rdx
cltq
movl 4(%rsi,%rax,4), %eax
movl %eax, 4(%rdi)
movl (%rsi,%rdx,4), %eax
movl %eax, 8(%rdi)
movl 4(%rsi,%rcx,4), %eax
movl %eax, 12(%rdi)
ret
As you can see, there are no conditionals, and the binary operators combine and optimize to very few instructions.
Edited to add:
If we define a getIndex(i, edge)
function, using three binary ANDs, one bit shift (right by 1), three additions, and one subtraction,
int getIndex(const int i, const int edge)
{
return (i & 1) + ((edge + 4 - (i & 1) + (i >> 1)) & 2);
}
with which edge_line()
can be implemented as
void edge_line(int line[4], const int rect[4], const int edge)
{
line[0] = rect[ getIndex(0, edge) ];
line[1] = rect[ getIndex(1, edge) ];
line[2] = rect[ getIndex(2, edge) ];
line[3] = rect[ getIndex(3, edge) ];
}
we get the exact same results as before. Using GCC-4.8.4 and -O2
on AMD64/x86-64 compiles to
getIndex:
movl %edi, %edx
sarl %edi
andl $1, %edx
subl %edx, %esi
leal 4(%rsi,%rdi), %eax
andl $2, %eax
addl %edx, %eax
ret
and to
getIndex:
movl 4(%esp), %eax
movl 8(%esp), %edx
movl %eax, %ecx
andl $1, %ecx
subl %ecx, %edx
sarl %eax
leal 4(%edx,%eax), %eax
andl $2, %eax
addl %ecx, %eax
ret
on i686. Note that I arrived at the above form using the four-by-four result table; there are other, more rigorous ways to construct it, and there might even be a more optimal form. Because of this, I seriously recommend adding a big huge comment above the function, explaining the intent, and preferably also showing the result table. Something like
/* This function returns an array index:
* 0 for left
* 1 for top
* 2 for right
* 3 for bottom
* given edge:
* 0 for left
* 1 for top
* 2 for right
* 3 for bottom
* and i:
* 0 for initial x
* 1 for initial y
* 2 for final x
* 3 for final y
*
* The result table is
* | edge
* | 0 1 2 3
* ----+-------
* i=0 | 0 0 2 2
* i=1 | 3 1 1 3
* i=2 | 0 2 2 0
* i=3 | 1 1 3 3
*
* Apologies for the write-only code.
*/
Or something similar.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With