Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is pd.Timestamp converted to np.datetime64 when calling '.values'?

When accessing the DataFrame.values, all pd.Timestamp objects are converted to np.datetime64 objects, why? An np.ndarray containing pd.Timestamp objects can exists, therefore I don't understand why would such automatic conversion always happen.

Would you know how to prevent it?

Minimal example:

import numpy as np
import pandas as pd
from datetime import datetime

# Let's declare an array with a datetime.datetime object
values = [datetime.now()]
print(type(values[0]))
> <class 'datetime.datetime'>

# Clearly, the datetime.datetime objects became pd.Timestamp once moved to a pd.DataFrame
df = pd.DataFrame(values, columns=['A'])
print(type(df.iloc[0][0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>

# Just to be sure, lets iterate over each datetime and manually convert them to pd.Timestamp
df['A'].apply(lambda x: pd.Timestamp(x))
print(type(df.iloc[0][0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>

# df.values (or series.values in this case) returns an np.ndarray
print(type(df.iloc[0].values))
> <class 'numpy.ndarray'>

# When we check what is the type of elements of the '.values' array, 
# it turns out the pd.Timestamp objects got converted to np.datetime64
print(type(df.iloc[0].values[0]))
> <class 'numpy.datetime64'>


# Just to double check, can an np.ndarray contain pd.Timestamps?
timestamp = pd.Timestamp(datetime.now())
timestamps = np.array([timestamp])
print(type(timestamps))
> <class 'numpy.ndarray'>

# Seems like it does. Why the above conversion then?
print(type(timestamps[0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>

python : 3.6.7.final.0

pandas : 0.25.3

numpy : 1.16.4

like image 855
Voy Avatar asked Mar 03 '23 05:03

Voy


2 Answers

Found a workaround - using .array instead of .values (docs)

print(type(df['A'].array[0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>

This prevents the conversion and gives me access to the objects I wanted to use.

like image 188
Voy Avatar answered Mar 05 '23 17:03

Voy


The whole idea behind .values is to:

Return a Numpy representation of the DataFrame. [docs]

I find it logical that a pd.Timestamp is then 'downgraded' to a dtype that is native to numpy. If it wouldn't do this, what is then the purpose of .values?

If you do want to keep the pd.Timestamp dtype I would suggest working with the original Series (df.iloc[0]). I don't see any other way since .values uses np.ndarray to convert according to the source on Github.

like image 32
jorijnsmit Avatar answered Mar 05 '23 18:03

jorijnsmit