Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to standardize/normalize a date with pandas/numpy?

With following code snippet

import pandas as pd
train = pd.read_csv('train.csv',parse_dates=['dates'])
print(data['dates'])

I load and control the data.

My question is, how can I standardize/normalize data['dates'] to make all the elements lie between -1 and 1 (linear or gaussian)??

like image 208
user1587451 Avatar asked Jun 24 '15 20:06

user1587451


People also ask

How do I standardize data in a Pandas Dataframe?

Method 1: Implementation in pandas [Z-Score] So that using a simple calculation of subtracting the element with its mean and dividing them with the standard deviation will give us the z-score of the data which is the standardized data.

How do I normalize a Pandas data in Python?

Normalization using pandas (Gives unbiased estimates) When normalizing we simply subtract the mean and divide by standard deviation.

How do you normalize and standardize data in Python?

Using MinMaxScaler() to Normalize Data in Python This is a more popular choice for normalizing datasets. You can see that the values in the output are between (0 and 1). MinMaxScaler also gives you the option to select feature range. By default, the range is set to (0,1).

What is normalize in NumPy?

In this article, we are going to discuss how to normalize 1D and 2D arrays in Python using NumPy. Normalization refers to scaling values of an array to the desired range.


2 Answers

import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
import time

def convert_to_timestamp(x):
    """Convert date objects to integers"""
    return time.mktime(x.to_datetime().timetuple())


def normalize(df):
    """Normalize the DF using min/max"""
    scaler = MinMaxScaler(feature_range=(-1, 1))
    dates_scaled = scaler.fit_transform(df['dates'])

    return dates_scaled

if __name__ == '__main__':
    # Create a random series of dates
    df = pd.DataFrame({
        'dates':
            ['1980-01-01', '1980-02-02', '1980-03-02', '1980-01-21',
             '1981-01-21', '1991-02-21', '1991-03-23']
    })

    # Convert to date objects
    df['dates'] = pd.to_datetime(df['dates'])

    # Now df has date objects like you would, we convert to UNIX timestamps
    df['dates'] = df['dates'].apply(convert_to_timestamp)

    # Call normalization function
    df = normalize(df)

Sample:

Date objects that we convert using convert_to_timestamp

       dates
0 1980-01-01
1 1980-02-02
2 1980-03-02
3 1980-01-21
4 1981-01-21
5 1991-02-21
6 1991-03-23

UNIX timestamps that we can normalize using a MinMaxScaler from sklearn

       dates
0  315507600
1  318272400
2  320778000
3  317235600
4  348858000
5  667069200
6  669661200

Normalized to (-1, 1), the final result

[-1.         -0.98438644 -0.97023664 -0.99024152 -0.81166138  0.98536228
  1.        ]
like image 118
bakkal Avatar answered Oct 12 '22 12:10

bakkal


a solution with Pandas

df = pd.DataFrame({
        'A':
            ['1980-01-01', '1980-02-02', '1980-03-02', '1980-01-21',
             '1981-01-21', '1991-02-21', '1991-03-23'] })
df['A'] = pd.to_datetime(df['A']).astype('int64')
max_a = df.A.max()
min_a = df.A.min()
min_norm = -1
max_norm =1
df['NORMA'] = (df.A- min_a) *(max_norm - min_norm) / (max_a-min_a) + min_norm
like image 31
steboc Avatar answered Oct 12 '22 12:10

steboc