I was trying the tutorial TensorFlow 2.0 Magnitude-based weight pruning with Keras and came across the parameter initial_sparsity
import tensorflow_model_optimization as tfmot
from tensorflow_model_optimization.sparsity import keras as sparsity
import numpy as np
epochs = 12
num_train_samples = x_train.shape[0]
end_step = np.ceil(1.0 * num_train_samples / batch_size).astype(np.int32) * epochs
print('End step: ' + str(end_step))
pruning_params = {
'pruning_schedule': sparsity.PolynomialDecay(initial_sparsity=0.50,
final_sparsity=0.90,
begin_step=2000,
end_step=end_step,
frequency=100)
}
The tutorial says:
The parameter used here means:
Sparsity PolynomialDecay is used across the whole training process. We start at the sparsity level 50% and gradually train the model to reach 90% sparsity. X% sparsity means that X% of the weight tensor is going to be pruned away.
My question is, shouldn't you start with initial_sparsity of 0% and then prune 90% of the weights off?
What does starting with initial_sparsity of 50% mean? Does this mean that 50% of the weights are pruned to begin with and then 90% sparsity of pruning is achieved?
Also, for tfmot.sparsity.keras.ConstantSparsity, the API is as follows:
pruning_params_unpruned = {
'pruning_schedule': sparsity.ConstantSparsity(
target_sparsity=0.0, begin_step=0,
end_step = 0, frequency=100
)
}
Initializes a Pruning schedule with constant sparsity.
Sparsity is applied in the interval [begin_step, end_step] every frequency steps. At each applicable step, the sparsity(%) is constant.
Does this mean that if a neural network model is already at a sparsity level of 50%, but the target_sparsity = 0.5 then will the pruning schedule do:
You can read about it in PolynomialDecay and in ConstantSparsity
Thanks
So I also found the Tensorflow documentation on weight pruning to be quite sparse, so I spent some quality time with the debugger to figure out how everything works.
At the most basic level, the Pruning Schedule is simply a function that takes the step as an input and produces a sparsity percentage. That sparsity value is then used to generate a mask, which is used to prune out weights with an absolute value less than then the k - 1 value given by the absolute value weight distribution and the sparsity percentage.
Class definition: Github Link
The comments included with the class definition above helped me understand how the PolynomialDecay scheduler works.
Pruning rate grows rapidly in the beginning from initial_sparsity, but then plateaus slowly to the target sparsity.
The function applied is
current_sparsity = final_sparsity + (initial_sparsity - final_sparsity) * (1 - (step - begin_step)/(end_step - begin_step)) ^ exponent
By the above equation, when step == begin_step
then current_sparsity = initial_sparsity
. Thus, the weights will be pruned to the initial_sparsity
on the step specified by the begin_step
parameter.
I would agree with your assessment, in that you would usually want to start pruning at a lower sparsity than 50%, but I do not have any published research I can cite to backup that claim. You may be able to find more information in the paper cited with the PolynomialDecay class definition, although I have not had a chance to read it myself.
Class definition: Github Link
The purpose of this scheduler appears to be pretty limited. With every valid prune step, the target_sparsity
is returned. As such, multiple pruning steps are very much redundant. The use case for this scheduler appears to be for a one time prune during training. The ability to prune with this scheduler multiple times is to align it with its parent abstract class and other pruning schedulers.
If the two above schedulers do not float your boat, the abstract class PruningSchedule exposes an endpoint which makes it very easy to create your own pruning scheduler, as convoluted as it may be. Below is an example I created myself.
Disclaimer: this scheduler is a creation of a 19 year-old college student's imagination and has no basis in any published literature.
PruningSchedule = tfmot.sparsity.keras.PruningSchedule
class ExponentialPruning(PruningSchedule):
def __init__(self, rate=0.01, begin_step=0, frequency=100, max_sparsity=0.9):
self.rate = rate
self.begin_step = begin_step
self.frequency = frequency
self.max_sparsity = max_sparsity
# Validation functions provided by the parent class
# The -1 parameter is for the end_step
# as this pruning schedule does not have one
# The last true value is a boolean flag which says it is okay
# to have no end_step
self._validate_step(self.begin_step, -1, self.frequency, True)
self._validate_sparsity(self.max_sparsity, 'Max Sparsity')
def __call__(self, step):
# Sparsity calculation endpoint
# step is a integer tensor
# The sparsity returned by __call__ must be a tensor
# of dtype=tf.float32, so tf.math is required.
# In the logic below, you can assume that a valid
# pruning step is passed.
p = tf.math.divide(
tf.cast(step - self.begin_step, tf.float32),
tf.constant(self.frequency, dtype=tf.float32)
)
sparsity = tf.math.subtract(
tf.constant(1, dtype=tf.float32),
tf.math.pow(
tf.constant(1 - self.rate, dtype=tf.float32),
p
)
)
sparsity = tf.cond(
tf.math.greater(sparsity, tf.constant(self.max_sparsity, dtype=tf.float32)),
lambda: tf.constant(self.max_sparsity, dtype=tf.float32),
lambda: sparsity
)
# This function returns a tuple of length 2
# The first value determines if pruning should occur on this step
# I recommend using the parent class function below for this purpose
# The negative one value denotes no end_step
# The second value is the sparsity to prune to
return (self._should_prune_in_step(step, self.begin_step, -1, self.frequency),
sparsity)
def get_config(self):
# A function required by the parent class
# return the class_name and the input parameters as
# done below
return {
'class_name': self.__class__.__name__,
'config': {
'rate': self.rate,
'begin_step': self.begin_step,
'frequency': self.frequency,
'max_sparsity': self.max_sparsity
}
}
If you only would like certain layers to be pruned, rather than all prunable layers, you can call the prune_low_magnitude
function on a layer which you are adding to your model.
prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude
model = keras.models.Sequential()
...
model.add(prune_low_magnitude(keras.layers.Dense(8, activation='relu', kernel_regularizer=keras.regularizers.l1(0.0001)),
ExponentialPruning(rate=1/8)))
Also make sure to pass a UpdatePruningStep
instance to the training callbacks:
m.fit(train_input, train_labels, epochs=epochs, validation_data=[test_input, test_labels],
callbacks=[UpdatePruningStep()])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With