This has become quite a frustrating question, but I've asked in the Coursera discussions and they won't help. Below is the question:
I've gotten it wrong 6 times now. How do I normalize the feature? Hints are all I'm asking for.
I'm assuming x_2^(2) is the value 5184, unless I am adding the x_0 column of 1's, which they don't mention but he certainly mentions in the lectures when talking about creating the design matrix X. In which case x_2^(2) would be the value 72. Assuming one or the other is right (I'm playing a guessing game), what should I use to normalize it? He talks about 3 different ways to normalize in the lectures: one using the maximum value, another with the range/difference between max and mins, and another the standard deviation -- they want an answer correct to the hundredths. Which one am I to use? This is so confusing.
Standardization (Z-score Normalization) The general method of calculation is to determine the distribution mean and standard deviation for each feature. Next we subtract the mean from each feature. Then we divide the values (mean is already subtracted) of each feature by its standard deviation.
Now in order to satisfy the BCNF, we will be dividing the table into two parts. One table will hold Student ID which already exists and newly created column Professor ID . And in the second table, we will have the columns Professor ID , Professor and Subject . By doing this we are satisfied the Boyce Codd Normal Form.
Using MinMaxScaler() to Normalize Data in Python This is a more popular choice for normalizing datasets. You can see that the values in the output are between (0 and 1). MinMaxScaler also gives you the option to select feature range. By default, the range is set to (0,1).
...use both feature scaling (dividing by the "max-min", or range, of a feature) and mean normalization.
So for any individual feature f:
f_norm = (f - f_mean) / (f_max - f_min)
e.g. for x2,(midterm exam)^2 = {7921, 5184, 8836, 4761}
> x2 <- c(7921, 5184, 8836, 4761) > mean(x2) 6676 > max(x2) - min(x2) 4075 > (x2 - mean(x2)) / (max(x2) - min(x2)) 0.306 -0.366 0.530 -0.470
Hence norm(5184) = 0.366
(using R language, which is great at vectorizing expressions like this)
I agree it's confusing they used the notation x2 (2) to mean x2 (norm) or x2'
EDIT: in practice everyone calls the builtin scale(...)
function, which does the same thing.
It's asking to normalize the second feature under second column using both feature scaling and mean normalization. Therefore,
(5184 - 6675.5) / 4075 = -0.366
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With