How to implement thrust::transform with custom functor that skips part of device_vector?

Question

I am working on a project (essentially a physics simulation) in which I need to perform calculations on a large number of nodes over many time steps. I have currently implemented each type of calculation by writing a custom functor which is called within a thrust::transform.

As a minimal example (with psuedo-code), say I have some data that all share a common structure, but can be broken down into different types (A, B and C), e.g. all have a

double value.

Thus, I am storing this data in a single device_vector as follows:

class Data {
    thrust::device_vector<double> values;
    unsigned values_begin_A, values_end_A;
    unsigned values_begin_B, values_end_B;
    unsigned values_begin_C, values_end_C;
}

where type A occupies the first part of the vector, followed by type B, then type C. To keep track, I save the start/end index values of each type.

The different types of data need to be acted on by different functors (e.g. functor1 is applied to type A and B; functor 2 to A, B, and C; and functor3 to A and C). Each functor needs access to the index of the value within the vector, supplied by a counting_iterator and will store the results in a separate vector.

struct my_functor : public thrust::unary_function< thrust::tuple<unsigned, double> , double > {

    __host__ __device__
    double operator() (const thrust::tuple<unsigned, double> index_value) {

        // Do something with the index and value.

        return result;
    }
}

My problem is that I do not know the best way to implement that last functor that acts on type A and C values while skipping B. In particular, I'm looking for a thrust-friendly solution that scales reasonably well as I add more node types and more functors (that act on a combination of old and new types) while still reaping the benefits of parallelization.

I've come up with four options:

Option 1:

Have one transform call for each data type, e.g.

void Option_One(thrust::device_vector<double>& result) {
    // Multiple transform calls.

    thrust::counting_iterator index(0);

    // Apply functor to 'A' values.
    thrust::transform( 
        thrust::make_zip_iterator(thrust::make_tuple(index, values.begin())),
        thrust::make_zip_iterator(thrust::make_tuple(index, values.begin())) + values_end_A,
        result.begin(),
        my_functor());

    // Apply functor to 'C' values.
    thrust::transform( 
        thrust::make_zip_iterator(thrust::make_tuple(index, values.begin())) + values_begin_C,
        thrust::make_zip_iterator(thrust::make_tuple(index, values.begin())) + values_end_C,
        result.begin() + values_begin_C,
        my_functor());
}

This seems fairly straightforward at the expense of efficiency, since I am sacrificing the ability to evaluate A and C in parallel.

Option 2:

Copy the values into a temporary vector, call the transform on the temporary vector, then copy the temporary results back into results. This seems like a lot of copying back and forth, but allows the transform to be called only once on A and C together.

void Option_Two(thrust::device_vector<double>& result) {

    // Copy 'A' and 'C' values into temporary vector
    thrust::device_vector<double> temp_values_A_and_C(size_A + size_C);
    thrust::copy(values.begin(), values.begin() + values_end_A, temp_values_A_and_C.begin());
    thrust::copy(values.begin() + values_begin_C, values.begin() + values_end_C, temp_values_A_and_C.begin() + values_end_A);

    // Store results in temporary vector.
    thrust::device_vector<double> temp_results_A_and_C(size_A + size_C);

    thrust::transform( 
        thrust::make_zip_iterator(thrust::make_tuple(index, temp_values_A_and_C.begin())),
        thrust::make_zip_iterator(thrust::make_tuple(index, temp_values_A_and_C.begin())) + size_A + size_C,
        temp_results_A_and_C.begin(),
        my_functor());


    // Copy temp results back into result
    // ....
}

Option 3:

Call the transform on all values, but alter the functor to check the index and only act on indices within the A or C range.

struct my_functor_with_index_checking : public thrust::unary_function< thrust::tuple<unsigned, double> , double > {

    __host__ __device__
    double operator() (const thrust::tuple<unsigned, double> index_value) {

        if ( (index >= values_begin_A && index <= values_end_A ) ||
            ( index >= values_begin_C && index <= values_end_C ) ) {

                // Do something with the index and value.
                return result;
             }
        else {
            // Do nothing;
            return 0; //Result is 0 by default.
        }
    }
}

void Option_Three(thrust::device_vector<double>& result) {

    // Apply functor to all values, but check index inside functor.
    thrust::transform( 
        thrust::make_zip_iterator(thrust::make_tuple(index, values.begin())),
        thrust::make_zip_iterator(thrust::make_tuple(index, values.begin())) + values.size(),
        result.begin(),
        my_functor_with_index_checking());
}

Option 4:

The final option I've come up with is to create a custom iterator based on the counting_iterator that counts normally within the A range, but then skips to the beginning of C once it reaches the end of A. This seems like an elegant solution, but I have no idea how to do this.

void Option_Four(thrust::device_vector<double>& result) {

    // Create my own version of a counting iterator
    // that skips from the end of 'A' to the beginning of 'C'
    // I don't know how to do this!
    FancyCountingIterator fancyIndex(0); 

    thrust::transform( 
        thrust::make_zip_iterator(thrust::make_tuple(fancyIndex, values.begin())),
        thrust::make_zip_iterator(thrust::make_tuple(fancyIndex, values.begin())) + values.size(),
        result.begin(),
        my_functor());
}

lxkarthi · Accepted Answer

Use permutation_iterator in combination with custom transform_iterator (The fancy iterator that you are looking for).

Data d; //assuming this has values.
unsigned A_size = d.values_end_A - d.values_begin_A;
unsigned C_size = d.values_end_C - d.values_begin_C;
auto A_C_index_iter = thrust::make_transform_iterator( thrust::make_counting_iterator(0), 
[&]__device__(int i) {
  if (i<A_size)
    return i+d.values_begin_A; 
  else 
    return (i-A_size)+d.values_begin_C;
});
auto permuted_input_iter = thrust::make_permutation_iterator(values.begin(), A_C_index_iter);
auto permuted_output_iter = thrust::make_permutation_iterator(result.begin(), A_C_index_iter);
thrust::transform(permuted_input_iter, permuted_input_iter + A_size + C_size, permuted_output_iter);

This utilizes full parallelism (A_size + C_size).

How to implement thrust::transform with custom functor that skips part of device_vector?

Tags:

c++

thrust

Charlie H.

1 Answers

lxkarthi

Recent Activity

Donate For Us

How to implement thrust::transform with custom functor that skips part of device_vector?

Tags:

c++

thrust

Charlie H.

1 Answers

lxkarthi

Related questions

Recent Activity

Donate For Us