With following code snippet
import pandas as pd
train = pd.read_csv('train.csv',parse_dates=['dates'])
print(data['dates'])
I load and control the data.
My question is, how can I standardize/normalize data['dates'] to make all the elements lie between -1 and 1 (linear or gaussian)??
Method 1: Implementation in pandas [Z-Score] So that using a simple calculation of subtracting the element with its mean and dividing them with the standard deviation will give us the z-score of the data which is the standardized data.
Normalization using pandas (Gives unbiased estimates) When normalizing we simply subtract the mean and divide by standard deviation.
Using MinMaxScaler() to Normalize Data in Python This is a more popular choice for normalizing datasets. You can see that the values in the output are between (0 and 1). MinMaxScaler also gives you the option to select feature range. By default, the range is set to (0,1).
In this article, we are going to discuss how to normalize 1D and 2D arrays in Python using NumPy. Normalization refers to scaling values of an array to the desired range.
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
import time
def convert_to_timestamp(x):
"""Convert date objects to integers"""
return time.mktime(x.to_datetime().timetuple())
def normalize(df):
"""Normalize the DF using min/max"""
scaler = MinMaxScaler(feature_range=(-1, 1))
dates_scaled = scaler.fit_transform(df['dates'])
return dates_scaled
if __name__ == '__main__':
# Create a random series of dates
df = pd.DataFrame({
'dates':
['1980-01-01', '1980-02-02', '1980-03-02', '1980-01-21',
'1981-01-21', '1991-02-21', '1991-03-23']
})
# Convert to date objects
df['dates'] = pd.to_datetime(df['dates'])
# Now df has date objects like you would, we convert to UNIX timestamps
df['dates'] = df['dates'].apply(convert_to_timestamp)
# Call normalization function
df = normalize(df)
convert_to_timestamp
dates
0 1980-01-01
1 1980-02-02
2 1980-03-02
3 1980-01-21
4 1981-01-21
5 1991-02-21
6 1991-03-23
MinMaxScaler
from sklearn
dates
0 315507600
1 318272400
2 320778000
3 317235600
4 348858000
5 667069200
6 669661200
[-1. -0.98438644 -0.97023664 -0.99024152 -0.81166138 0.98536228
1. ]
a solution with Pandas
df = pd.DataFrame({
'A':
['1980-01-01', '1980-02-02', '1980-03-02', '1980-01-21',
'1981-01-21', '1991-02-21', '1991-03-23'] })
df['A'] = pd.to_datetime(df['A']).astype('int64')
max_a = df.A.max()
min_a = df.A.min()
min_norm = -1
max_norm =1
df['NORMA'] = (df.A- min_a) *(max_norm - min_norm) / (max_a-min_a) + min_norm
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With