What's the difference between dask=parallelized and dask=allowed in xarray's apply_ufunc?

Tags:

In the xarray documentation for the function apply_ufunc it says:

dask: ‘forbidden’, ‘allowed’ or ‘parallelized’, optional

    How to handle applying to objects containing lazy data in the form of dask arrays:

    ‘forbidden’ (default): raise an error if a dask array is encountered.
    ‘allowed’: pass dask arrays directly on to func.
    ‘parallelized’: automatically parallelize func if any of the inputs are a dask array. 
                    If used, the output_dtypes argument must also be provided. 
                    Multiple output arguments are not yet supported.

and in the documentation's page on Parallel Computing then there is a note:

For the majority of NumPy functions that are already wrapped by dask, it’s usually a better idea to use the pre-existing dask.array function, by using either a pre-existing xarray methods or apply_ufunc() with dask='allowed'. Dask can often have a more efficient implementation that makes use of the specialized structure of a problem, unlike the generic speedups offered by dask='parallelized'.

However, I'm still not clear as to what the difference between these two options is. Does allowed still operate on chunks one by one to lower memory usage? Will allowed still parallelize if the applied ufunc only uses dask operations? Why does parallelized require you to give more information about the ufunc outputs (i.e. the arguments output_dtypes, output_sizes) than allowed does?

398

asked Aug 07 '18 22:08

ThomasNicholas

1 Answers

dask='allowed' means that you're applying a function that already knows how to handle dask arrays, e.g., a function written in terms of dask.array operations. In most cases, that does indeed mean that the function will operate on chunks one by one to lower memory usage, and will apply the computation in parallel.

dask='parallelized' requires more information from the user because it creates its own wrapper that allows the provided function to act on dask arrays, by using low-level dask.array functions like atop. With dask='parallelized', you can provide a function that only knows how to handle NumPy arrays, and xarray.apply_ufunc will extend it to handle dask arrays, too.

answered Sep 28 '22 17:09

shoyer

Related questions
                            
                                React Material UI Grid oversize full screen
                            
                                How should I log in my pytest plugin?
                            
                                Error message below checkbox's when validate Bootstrap 4
                            
                                Redux Web Extension - Uncaught TypeError: Invalid attempt to spread non-iterable instance
                            
                                Can upload photo when using the Google Photos API
                            
                                How to consume from Kafka Spring Cloud Stream by default and also consume a Kafka message generated by the confluent API?
                            
                                Is std::move safe in an arguments list when the argument is forwarded, not move constructed?
                            
                                OpenCV Detect scratches on fruits
                            
                                How to limit the number of rows printed in kable
                            
                                How solidity make function signature with tuple(nested abi)?
                            
                                Remove . and , in the beginning/end of every element from the array list in Angular 6
                            
                                what is Jquery $(this).val() equivalent in AngularJS? and How to set input field value to zero if input field is empty after updating?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With