Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

initial_sparsity parameter in sparsity.PolynomialDecay() TensorFlow 2.0 magnitude-based weight pruning

I was trying the tutorial TensorFlow 2.0 Magnitude-based weight pruning with Keras and came across the parameter initial_sparsity

import tensorflow_model_optimization as tfmot
from tensorflow_model_optimization.sparsity import keras as sparsity
import numpy as np

epochs = 12
num_train_samples = x_train.shape[0]
end_step = np.ceil(1.0 * num_train_samples / batch_size).astype(np.int32) * epochs
print('End step: ' + str(end_step))

pruning_params = {
      'pruning_schedule': sparsity.PolynomialDecay(initial_sparsity=0.50,
                                                   final_sparsity=0.90,
                                                   begin_step=2000,
                                                   end_step=end_step,
                                                   frequency=100)
}

The tutorial says:

The parameter used here means:

Sparsity PolynomialDecay is used across the whole training process. We start at the sparsity level 50% and gradually train the model to reach 90% sparsity. X% sparsity means that X% of the weight tensor is going to be pruned away.

My question is, shouldn't you start with initial_sparsity of 0% and then prune 90% of the weights off?

What does starting with initial_sparsity of 50% mean? Does this mean that 50% of the weights are pruned to begin with and then 90% sparsity of pruning is achieved?

Also, for tfmot.sparsity.keras.ConstantSparsity, the API is as follows:

pruning_params_unpruned = {
    'pruning_schedule': sparsity.ConstantSparsity(
        target_sparsity=0.0, begin_step=0,
        end_step = 0, frequency=100
    )
}

Initializes a Pruning schedule with constant sparsity.

Sparsity is applied in the interval [begin_step, end_step] every frequency steps. At each applicable step, the sparsity(%) is constant.

Does this mean that if a neural network model is already at a sparsity level of 50%, but the target_sparsity = 0.5 then will the pruning schedule do:

  1. No pruning, since the model is already at a pruned level of 50%
  2. It further prunes 50% of the weights of the already (50% pruned) model

You can read about it in PolynomialDecay and in ConstantSparsity

Thanks

like image 592
Arun Avatar asked Jan 01 '23 08:01

Arun


1 Answers

So I also found the Tensorflow documentation on weight pruning to be quite sparse, so I spent some quality time with the debugger to figure out how everything works.

How Pruning Schedules Work

At the most basic level, the Pruning Schedule is simply a function that takes the step as an input and produces a sparsity percentage. That sparsity value is then used to generate a mask, which is used to prune out weights with an absolute value less than then the k - 1 value given by the absolute value weight distribution and the sparsity percentage.

PolynomialDecay

Class definition: Github Link
The comments included with the class definition above helped me understand how the PolynomialDecay scheduler works.

Pruning rate grows rapidly in the beginning from initial_sparsity, but then plateaus slowly to the target sparsity.

The function applied is

current_sparsity = final_sparsity + (initial_sparsity - final_sparsity) * (1 - (step - begin_step)/(end_step - begin_step)) ^ exponent

By the above equation, when step == begin_step then current_sparsity = initial_sparsity. Thus, the weights will be pruned to the initial_sparsity on the step specified by the begin_step parameter.

I would agree with your assessment, in that you would usually want to start pruning at a lower sparsity than 50%, but I do not have any published research I can cite to backup that claim. You may be able to find more information in the paper cited with the PolynomialDecay class definition, although I have not had a chance to read it myself.

ConstantSparsity

Class definition: Github Link
The purpose of this scheduler appears to be pretty limited. With every valid prune step, the target_sparsity is returned. As such, multiple pruning steps are very much redundant. The use case for this scheduler appears to be for a one time prune during training. The ability to prune with this scheduler multiple times is to align it with its parent abstract class and other pruning schedulers.

Creating Your Own Pruning Scheduler

If the two above schedulers do not float your boat, the abstract class PruningSchedule exposes an endpoint which makes it very easy to create your own pruning scheduler, as convoluted as it may be. Below is an example I created myself.

Disclaimer: this scheduler is a creation of a 19 year-old college student's imagination and has no basis in any published literature.

PruningSchedule = tfmot.sparsity.keras.PruningSchedule

class ExponentialPruning(PruningSchedule):
def __init__(self, rate=0.01, begin_step=0, frequency=100, max_sparsity=0.9):
    self.rate = rate
    self.begin_step = begin_step
    self.frequency = frequency
    self.max_sparsity = max_sparsity

    # Validation functions provided by the parent class
    # The -1 parameter is for the end_step
    # as this pruning schedule does not have one
    # The last true value is a boolean flag which says it is okay
    # to have no end_step
    self._validate_step(self.begin_step, -1, self.frequency, True)
    self._validate_sparsity(self.max_sparsity, 'Max Sparsity')

def __call__(self, step):
    # Sparsity calculation endpoint

    # step is a integer tensor

    # The sparsity returned by __call__ must be a tensor
    # of dtype=tf.float32, so tf.math is required.

    # In the logic below, you can assume that a valid
    # pruning step is passed.

    p = tf.math.divide(
        tf.cast(step - self.begin_step, tf.float32),
        tf.constant(self.frequency, dtype=tf.float32)
    )
    sparsity = tf.math.subtract(
        tf.constant(1, dtype=tf.float32),
        tf.math.pow(
            tf.constant(1 - self.rate, dtype=tf.float32),
            p
        )
    )

    sparsity = tf.cond(
        tf.math.greater(sparsity, tf.constant(self.max_sparsity, dtype=tf.float32)),
        lambda: tf.constant(self.max_sparsity, dtype=tf.float32),
        lambda: sparsity
    )

    # This function returns a tuple of length 2
    # The first value determines if pruning should occur on this step
    # I recommend using the parent class function below for this purpose
    # The negative one value denotes no end_step
    # The second value is the sparsity to prune to
    return (self._should_prune_in_step(step, self.begin_step, -1, self.frequency),
            sparsity)

def get_config(self):
    # A function required by the parent class
    # return the class_name and the input parameters as
    # done below
    return {
        'class_name': self.__class__.__name__,
        'config': {
            'rate': self.rate,
            'begin_step': self.begin_step,
            'frequency': self.frequency,
            'max_sparsity': self.max_sparsity
        }
    }

Using a Pruning Scheduler

If you only would like certain layers to be pruned, rather than all prunable layers, you can call the prune_low_magnitude function on a layer which you are adding to your model.

prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude
model = keras.models.Sequential()
...
model.add(prune_low_magnitude(keras.layers.Dense(8, activation='relu', kernel_regularizer=keras.regularizers.l1(0.0001)),
  ExponentialPruning(rate=1/8)))

Also make sure to pass a UpdatePruningStep instance to the training callbacks:

m.fit(train_input, train_labels, epochs=epochs, validation_data=[test_input, test_labels],
  callbacks=[UpdatePruningStep()])
like image 167
Andrew Avatar answered Jan 05 '23 16:01

Andrew