tf.Data: what are stragglers in parallel interleaving?

Tags:

interleave is a tf.Data.Dataset method that can be used to interleave together elements from multiple datasets. tf.contrib.data.parallel_interleave provides a parallel version of the same functionality with the help of apply.

I can see that reading from many datasets in parallel and having buffers for them as allowed by the parallel version will improve throughput. But the documentation also has this to say about how parallel_interleave can increase data throughput:

Unlike tf.data.Dataset.interleave, it gets elements from cycle_length nested datasets in parallel, which increases the throughput, especially in the presence of stragglers.

What exactly are stragglers, and why does parallel_interleave work especially well in terms of throughput in their presence?

267

asked Mar 20 '18 20:03

mikkola

1 Answers

A straggler is a function which takes longer than normal to produce its output. This can be due to congestion on the network, or weird combination of randomness.

interleave does all the processing in a sequential manner, on a single thread. In the following schema, let ___ denote waiting for IO/Computation, <waiting> denote waiting for its turn to spit an element and 111 denote producing the first element (1).

Suppose we have a dataset of directories ds = [A, B, C, D] and we produce files 1,2,3... from each of them. Then using r = ds.interleave(cycle_length=3, block_length=2) will work kind of like this:

A: ___111___222
B:   <waiting> ___111___________222
C:   <waiting> <waiting> <waiting> ___111___222

R: ____A1____A2____B1____________B2____C1____C2

You see that if producing elements from B straggles, all following elements will have to wait to be processed.

parallel_interleave helps in two ways with stragglers. First, it starts each element in the cycle in parallel (hence the name). Therefore, the production schema becomes:

A: ___111___222
B: ___<waiting>111___________222
C: ___<waiting><waiting><waitin>111___222

R: ____A1____A2_B1____________B2_C1____C2|....|

Doing this helps with reducing useless waiting by waiting in parallel. The part |....| shows how much we saved compared to the sequential version.

The second way it helps is by allowing a sloppy argument. If we set it to True, it allows skipping over an unavailable element until it is available, at the cost of producing a non-deterministic order. Here's how:

A: ___111___<w>222
B: ___<w>111___________222
C: ___<w><w>111___222

R: ____A1_B1_C1_A2_C2___B2|...................|

Look at that saving!! But also look at the order of the elements !

I reproduce these in code. It is an ugly way, but it illustrates the differences a bit.

from time import sleep
DS = tf.data.Dataset

def repeater(val):
    def _slow_gen():
        for i in range(5):
            if i % 2:
                sleep(1)
            yield i
    return DS.from_generator(_slow_gen, tf.int8)

ds = DS.range(5)

slow_ds = ds.interleave(repeater, cycle_length=2, block_length=3)

para_ds = ds.apply(tf.contrib.data.parallel_interleave(
    repeater, cycle_length=2, block_length=3)
)

sloppy_ds = ds.apply(tf.contrib.data.parallel_interleave(
    repeater, cycle_length=2, block_length=3, sloppy=True)
)


%time apply_python_func(slow_ds, print, sess)
# 10 sec, you see it waiting each time

%time apply_python_func(para_ds, print, sess) 
#  3 sec always! you see it burping a lot after the first wait

%time apply_python_func(sloppy_ds, print, sess) 
# sometimes 3, sometimes 4 seconds

And here's the function to show a dataset

def apply_python_func(ds, func, sess):
    """Exact values from ds using sess and apply func on them"""
    it = ds.make_one_shot_iterator()
    next_value = it.get_next()
    num_examples = 0
    while True:
        try:
            value = sess.run(next_value)
            num_examples += 1
            func(value)
        except tf.errors.OutOfRangeError:
            break
    print('Evaluated {} examples'.format(num_examples))

answered Sep 22 '22 01:09

Ciprian Tomoiagă

Related questions
                            
                                imgradient matlab equivalent in Python
                            
                                How to extract False Positive, False Negative from a confusion matrix of multiclass classification
                            
                                Python 3: unittest.mock how to specify different return values for specific inputs?
                            
                                'Image not found' Error After Installing OpenCV Python Wheel on Mac
                            
                                Randomly sampling lines from a file
                            
                                matplotlib: formatting of timestamp on x-axis
                            
                                How to build a N*(N+1) matrix with number in range of 1~N*N and totally distributed?
                            
                                How can I use numpy to create a diagonal matrix from a 1d array?
                            
                                How to count the number of reduced proper fractions fast enough?
                            
                                How to conditionally assign values to tensor [masking for loss function]?
                            
                                Prevent backtracking on regex to find non-comment lines (not starting with indented '#')
                            
                                Groupby multiple columns in a list
                            
                                Python Logging in Docker
                            
                                Python: PBS submission, what happens if I change script?
                            
                                python pandas: split comma-separated column into new columns - one per value
                            
                                datetime to decimal hour and minutes in python3
                            
                                Difference between exec behavior when module is imported or not
                            
                                Tensorflow Error: "Label IDs must < n_classes", but my Label IDs appear to meet this requirement already
                            
                                how to 'fuzzy' match strings when merge two dataframe in pandas
                            
                                Why is subprocess.Popen blocking?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

tf.Data: what are stragglers in parallel interleaving?

Tags:

python

tensorflow

tensorflow-datasets

mikkola

People also ask

1 Answers

Ciprian Tomoiagă

Recent Activity

Donate For Us