how to normalize input data for models in tensorflow

Tags:

tensorflow

My training data are saved in 3 files, each file is too large and cannot fit into memory.For each training example, the data are two dimensionality (2805 rows and 222 columns, the 222nd column is for label) and are numerical values. I would like to normalize the data before feeding into models for training. Below is my code for input_pipeline, and the data has not been normalized before creating dataset. Is there some functions in tensorflow that can do normalization for my case?

dataset = tf.data.TextLineDataset([file1, file2, file3])
# combine 2805 lines into a single example
dataset = dataset.batch(2805)

def parse_example(line_batch):
    record_defaults = [[1.0] for col in range(0, 221)]
    record_defaults.append([1])
    content = tf.decode_csv(line_batch, record_defaults = record_defaults, field_delim = '\t')
    features = tf.stack(content[0:221])
    features = tf.transpose(features)
    label = content[-1][-1]
    label = tf.one_hot(indices = tf.cast(label, tf.int32), depth = 2)
    return features, label

dataset = dataset.map(parse_example)
dataset = dataset.shuffle(1000)
# batch multiple examples
dataset = dataset.batch(batch_size)
dataset = dataset.repeat(num_epochs)
iterator = dataset.make_one_shot_iterator()
data_batch, label_batch = iterator.get_next()

415

asked May 15 '18 08:05

jing

1 Answers

There are different ways of "normalizing data". Depending which one you have in mind, it may or may not be easy to implement in your case.

1. Fixed normalization

If you know the fixed range(s) of your values (e.g. feature #1 has values in [-5, 5], feature #2 has values in [0, 100], etc.), you could easily pre-process your feature tensor in parse_example(), e.g.:

def normalize_fixed(x, current_range, normed_range):
    current_min, current_max = tf.expand_dims(current_range[:, 0], 1), tf.expand_dims(current_range[:, 1], 1)
    normed_min, normed_max = tf.expand_dims(normed_range[:, 0], 1), tf.expand_dims(normed_range[:, 1], 1)
    x_normed = (x - current_min) / (current_max - current_min)
    x_normed = x_normed * (normed_max - normed_min) + normed_min
    return x_normed

def parse_example(line_batch, 
                  fixed_range=[[-5, 5], [0, 100], ...],
                  normed_range=[[0, 1]]):
    # ...
    features = tf.transpose(features)
    features = normalize_fixed(features, fixed_range, normed_range)
    # ...

2. Per-sample normalization

If your features are supposed to have approximately the same range of values, per-sample normalization could also be considered, i.e. applying normalization considering the features moments (mean, variance) for each sample:

def normalize_with_moments(x, axes=[0, 1], epsilon=1e-8):
    mean, variance = tf.nn.moments(x, axes=axes)
    x_normed = (x - mean) / tf.sqrt(variance + epsilon) # epsilon to avoid dividing by zero
    return x_normed

def parse_example(line_batch):
    # ...
    features = tf.transpose(features)
    features = normalize_with_moments(features)
    # ...

3. Batch normalization

You could apply the same procedure over a complete batch instead of per-sample, which may make the process more stable:

data_batch = normalize_with_moments(data_batch, axis=[1, 2])

Similarly, you could use tf.nn.batch_normalization

4. Dataset normalization

Normalizing using the mean/variance computed over the whole dataset would be the trickiest, since as you mentioned it is a large, split one.

tf.data.Dataset isn't really meant for such global computation. A solution would be to use whatever tools you have to pre-compute the dataset moments, then use this information for your TF pre-processing.

As mentioned by @MiniQuark, Tensorflow has a Transform library you could use to preprocess your data. Have a look at the Get Started, or for instance at the tft.scale_to_z_score() method for sample normalization.

198

answered Oct 10 '22 01:10

benjaminplanche

Related questions
                            
                                How to extract and save images from tensorboard event summary?
                            
                                keras - cannot import name Conv2D
                            
                                What is the difference between MaxPool and MaxPooling layers in Keras?
                            
                                Tensorflow ValueError: No variables to save from
                            
                                How can I run Tensorflow on one single core?
                            
                                TensorFlow on Windows: "Couldn't open CUDA library cudnn64_5.dll"
                            
                                'Dense' object has no attribute 'op' [closed]
                            
                                How to rename a variable which respects the name scope?
                            
                                module 'tensorflow._api.v2.train' has no attribute 'GradientDescentOptimizer'
                            
                                How to graph tf.keras model in Tensorflow-2.0?
                            
                                TensorFlow on Windows: "not a supported wheel on this platform" error
                            
                                How to install libcusolver.so.11
                            
                                What are c_state and m_state in Tensorflow LSTM?
                            
                                Streaming large training and test files into Tensorflow's DNNClassifier
                            
                                ModuleNotFoundError: No module named 'tensorflow.examples'
                            
                                tensorflow constant with variable size
                            
                                Keras LSTM input dimension setting
                            
                                How to do a column sum in Tensorflow?
                            
                                How does TensorFlow SparseCategoricalCrossentropy work?
                            
                                Tensorflow cannot open libcuda.so.1

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With