A common technique in parallelization is to fuse nested for loops like this
for(int i=0; i<n; i++) {
for(int j=0; j<n; j++) {
to
for(int x=0; x<n*n; x++) {
int i = x/n; int j = x%n;
I'm wondering how I can do this to fuse a triangle loop like this
for(int i=0; i<n; i++) {
for(int j=0; j<i+1; j++) {
This has n*(n+1)/2
iterations. Let's call the fused iteration x
. Using the quadratic formula I have come up with this:
for(int x=0; x<(n*(n+1)/2); x++) {
int i = (-1 + sqrt(1.0+8.0*x))/2;
int j = x - i*(i+1)/2;
Unlike fusing the square loop this requires using the sqrt
function and conversions from int to float and from float to int.
I'm wondering if there is a simpler or more efficient way of doing this? For example a solution which does not require the sqrt
function or conversions from int to float or float to int.
Edit: I don't want a solution which depends on previous or next iterations. I only want solutions like int i = funci(x) and int j = funcj(x,i)
Here is some code showing that this works:
#include <stdio.h>
#include <math.h>
int main() {
int n = 5;
int cnt = 0;
for(int i=0; i<n; i++) {
for(int j=0; j<i+1; j++) {
printf("%d: %d %d\n", cnt++, i,j);
}
} printf("\n");
int nmax = n*(n+1)/2;
for(int x=0; x<nmax; x++) {
int i = (-1 + sqrt(1.0+8.0*x))/2;
int j = x - i*(i+1)/2;
printf("%d: %d %d\n", x,i,j);
}
}
Considering that you're trying to fuse a triangle with the intent of parallelizing, the non-obvious solution is to choose a non-trivial mapping of x to (i,j):
j |\ i ->
| \ ____
| | \ => |\\ |
V |___\ |_\\__|
After all, you're not processing them in any special order, so the exact mapping is a don't care.
So calculate x->i,j
as you'd do for a rectangle, but if i > j
then { i=N-i, j = N-j }
(mirror Y axis, then mirror X axis).
____
|\\ | |\ |\
|_\\__| ==> |_\ __ => | \
/ | | \
/__| |___\
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With