According to the dask documentaion it's possible to specify the chunks in one of three ways:
- a blocksize like 1000
- a blockshape like (1000, 1000)
- explicit sizes of all blocks along all dimensions, like ((1000, 1000, 500), (400, 400))
Your chunks input will be normalized and stored in the third and most explicit form..
After trying to understand the way chunks work using the visualize() function, there are still a few things I'm not sure about:
If the input is normalized, does it matter which input form I choose?
Blocksize means every chunk has the size of X, i.e. 1000. What does the blockshape input specify?
When giving a blockshape input, does the order of parameters make a difference? How is it related to the shape of the array/matrix?
The forms lower in that list are more explicit and allow for greater asymmetry in your block shapes.
We'll discuss this through a sequence of examples of chunks
on the following array:
1 2 3 4 5 6
7 8 9 0 1 2
3 4 5 6 7 8
9 0 1 2 3 4
5 6 7 8 9 0
1 2 3 4 5 6
We show how different chunks
arguments split the array into different blocks
chunks=3
Symmetric blocks of size 3
1 2 3 4 5 6
7 8 9 0 1 2
3 4 5 6 7 8
9 0 1 2 3 4
5 6 7 8 9 0
1 2 3 4 5 6
chunks=2
Symmetric blocks of size 2
1 2 3 4 5 6
7 8 9 0 1 2
3 4 5 6 7 8
9 0 1 2 3 4
5 6 7 8 9 0
1 2 3 4 5 6
chunks=(3, 2)
Asymmetric but repeated blocks of size (3, 2)
1 2 3 4 5 6
7 8 9 0 1 2
3 4 5 6 7 8
9 0 1 2 3 4
5 6 7 8 9 0
1 2 3 4 5 6
chunks=(1, 6)
Asymmetric but repeated blocks of size (1, 6)
1 2 3 4 5 6
7 8 9 0 1 2
3 4 5 6 7 8
9 0 1 2 3 4
5 6 7 8 9 0
1 2 3 4 5 6
chunks=((2, 4), (3, 3))
Asymmetric and non-repeated blocks
1 2 3 4 5 6
7 8 9 0 1 2
3 4 5 6 7 8
9 0 1 2 3 4
5 6 7 8 9 0
1 2 3 4 5 6
chunks=((2, 2, 1, 1), (3, 2, 1))
Asymmetric and non-repeated blocks
1 2 3 4 5 6
7 8 9 0 1 2
3 4 5 6 7 8
9 0 1 2 3 4
5 6 7 8 9 0
1 2 3 4 5 6
The latter examples are rarely provided by users on original data but arise from complex slicing and broadcasting operations. Generally I use the simplest form until I need more complex forms. The choice of chunks should align with the computations you want to do.
For example, if you plan to take out thin slices along the first dimension then you might want to make that dimension skinnier than the others. If you plan to do linear algebra then you might want more symmetric blocks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With