Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why std::transform doesn't guarantee the order (but for_each guarantee the order)? Doesn't this allow trick implementation for performance?

Tags:

c++

stl

I just realize the standard doesn't guarantee the order of applying function callback in std::transform. And it doesn't allow the callback function or functor have side effect. But at the same time std::for_each actually guarantee the order.

One guess is the transform can using high performance algorithm which doesn't guarantee order, But O(N) is the best algorithm already.

So why the standard doesn't make the transform have the behavior as for_each from the view of apply callback function order? The user will benefit form this guarantee.

like image 518
ZijingWu Avatar asked Jun 28 '13 03:06

ZijingWu


3 Answers

Quoting directly from the standard (copied at end):

You will see from the template declaration of std::transform<> that the input iterator parameters must conform to the concept of an InputIterator.

An InputIterator is one of the most restrictive iterator concepts in c++. It does not support any kind of random access. It is only able to advance.

Therefore any implementation of std::transform that requires the iterator to do anything other than be dereferenced or advance is ill-formed. Remember that by specifying InputIterator, the standard is explicitly allowing the use of a std::istream_iterator (for example) and the implementation of std::transform is required to respect the restrictions therein. It must be written only in terms of the methods available on the InputIterator concept.

Therefore, by implication, the implementation of this function must access elements sequentially (and therefore transform values sequentially) since to not do so would break the contract implicit in the interface.

Therefore the standard does (implicitly and quietly) guarantee that std::transform initialise its elements sequentially. It would be impossible to write a well-formed implementation of std::transform that did not.

25.3.4 Transform [alg.transform]

template<class InputIterator, class OutputIterator, class UnaryOperation>
OutputIterator
transform(InputIterator first, InputIterator last, OutputIterator result, UnaryOperation op);

template<class InputIterator1, class InputIterator2, class OutputIterator, class BinaryOperation>
OutputIterator
transform(InputIterator1 first1, InputIterator1 last1, InputIterator2 first2, OutputIterator result, BinaryOperation binary_op);

1 Effects: Assigns through every iterator i in the range [result,result + (last1 - first1)) a new corresponding value equal to op(*(first1 + (i - result)) or binary_op(*(first1 + (i - result)), *(first2 + (i - result))).

2 Requires: op and binary_op shall not invalidate iterators or subranges, or modify elements in the ranges [first1,last1], [first2,first2 + (last1 - first1)], and [result,result + (last1 - first1)].

3 Returns: result + (last1 - first1).

4 Complexity: Exactly last1 - first1 applications of op or binary_op.

5 Remarks: result may be equal to first in case of unary transform, or to first1 or first2 in case of binary transform.

like image 110
Richard Hodges Avatar answered Sep 28 '22 03:09

Richard Hodges


This non-restrictive definition allows for parallel computing. An implementation may choose to apply transform function using several threads. See also related question: STL algorithms and concurrent programming

Think of it as a semantic difference in algorithms (that is, that represents programmer's intent rather than being just another tool). With for_each you state that you need a sequential scan. With transform you state that you only need to aply a function to every item in the container, but you don't care how it will be done.

like image 44
Lyth Avatar answered Sep 28 '22 02:09

Lyth


Despite some earlier answers, I think it is possible to implement std::transform in a parallel way. For instance like this:

1) Fetch all input sequentially.

2) Iterate over OutputIterator, initialising dummy objects and keep a reference to each output.

3) Distribute the input with the corresponding output iterator to different threads, each doing the transformation independently.

Like this, the iterators are only incremented as allowed.

As pointed out by clcto, another implementation could do step 1) first, then create a vector for all output elements, then compute all of these in parallel using the given function argument and then write them sequentially into the output.

like image 26
gnuplotshouldhaveanapi Avatar answered Sep 28 '22 03:09

gnuplotshouldhaveanapi