I have an ARFF file containing 14 numerical columns. I want to perform a normalization on each column separately, that is modifying the values from each colum to (actual_value - min(this_column)) / (max(this_column) - min(this_column)
). Hence, all values from a column will be in the range [0, 1]
. The min and max values from a column might differ from those of another column.
How can I do this with Weka filters?
Thanks
The purpose of normalization is, primarily, to scale numeric data from different columns down to an equivalent scale. For example, suppose you execute the LINEAR_REG function on a data set with two feature columns, current_salary and years_worked . The output value you are trying to predict is a worker's future salary.
Scaling just changes the range of your data. Normalization is a more radical transformation. The point of normalization is to change your observations so that they can be described as a normal distribution.
This can be done using
weka.filters.unsupervised.attribute.Normalize
After applying this filter all values in each column will be in the range [0, 1]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With