I can't make scilearn work with a datetime series.
found this post but did not help me = Pandas : TypeError: float() argument must be a string or a number
the csv file has 2 date columns with a date, dates are in the following format: 2017-07-21 06:19:53 (string)
i converted the string to an datetime64[ns], so the date became a long value and i could do calculations on it. scilearn refuses this type and gives the error float() argument must be a string or a number, not 'Timestamp'
also tried with pandas.to_datetime() no luck.
the model i use in scilearn is the KMeans clustering model. when printing the dtypes this is the result:
ip int64
date datetime64[ns]
succesFlag int64
app int64
enddate datetime64[ns]
user_userid int64
dtype: object
Here is my code:
def getDataframe():
df = pd.read_csv(filename)
df['date']=df['date'].astype('datetime64[ns]',inplace=True)
df['enddate']=df['enddate'].astype('datetime64[ns]',inplace=True)
df['app']=df['app'].replace({
"Azure": 0 ,
"Peoplesoft":1,
"Office":2 ,
"DevOps":3 ,
"Optima":4 ,
"Ada-Tech": 5
},inplace=True)
df['ip']=df['ip'].apply(lambda x: int(ip4.ip_address(x))).to_frame('ip')
print(df.dtypes)
return df
the expectation was that KMeans clustering model would work with numerical values as i converted them but it did not.
what did i wrong ?
I suggest change your solution - a but simplify also:
parse_dates
for converting columns to datetimes and then to numeric unix datetimes inplace=True
or use faster map
- it also create NaNs for non matched values, so output is numeric too def getDataframe():
df = pd.read_csv(filename, parse_dates=['date','enddate'])
df[['date','enddate']] = df[['date','enddate']].astype(np.int64) // 10**9
df['app']=df['app'].map({
"Azure": 0 ,
"Peoplesoft":1,
"Office":2 ,
"DevOps":3 ,
"Optima":4 ,
"Ada-Tech": 5
})
df['ip']=df['ip'].apply(lambda x: int(ip4.ip_address(x))).to_frame('ip')
print(df.dtypes)
return df
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With