When accessing the DataFrame.values
, all pd.Timestamp
objects are converted to np.datetime64
objects, why? An np.ndarray
containing pd.Timestamp
objects can exists, therefore I don't understand why would such automatic conversion always happen.
Would you know how to prevent it?
Minimal example:
import numpy as np
import pandas as pd
from datetime import datetime
# Let's declare an array with a datetime.datetime object
values = [datetime.now()]
print(type(values[0]))
> <class 'datetime.datetime'>
# Clearly, the datetime.datetime objects became pd.Timestamp once moved to a pd.DataFrame
df = pd.DataFrame(values, columns=['A'])
print(type(df.iloc[0][0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>
# Just to be sure, lets iterate over each datetime and manually convert them to pd.Timestamp
df['A'].apply(lambda x: pd.Timestamp(x))
print(type(df.iloc[0][0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>
# df.values (or series.values in this case) returns an np.ndarray
print(type(df.iloc[0].values))
> <class 'numpy.ndarray'>
# When we check what is the type of elements of the '.values' array,
# it turns out the pd.Timestamp objects got converted to np.datetime64
print(type(df.iloc[0].values[0]))
> <class 'numpy.datetime64'>
# Just to double check, can an np.ndarray contain pd.Timestamps?
timestamp = pd.Timestamp(datetime.now())
timestamps = np.array([timestamp])
print(type(timestamps))
> <class 'numpy.ndarray'>
# Seems like it does. Why the above conversion then?
print(type(timestamps[0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>
python : 3.6.7.final.0
pandas : 0.25.3
numpy : 1.16.4
Found a workaround - using .array
instead of .values
(docs)
print(type(df['A'].array[0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>
This prevents the conversion and gives me access to the objects I wanted to use.
The whole idea behind .values
is to:
Return a Numpy representation of the DataFrame. [docs]
I find it logical that a pd.Timestamp
is then 'downgraded' to a dtype
that is native to numpy
. If it wouldn't do this, what is then the purpose of .values
?
If you do want to keep the pd.Timestamp
dtype
I would suggest working with the original Series
(df.iloc[0]
). I don't see any other way since .values
uses np.ndarray
to convert according to the source on Github.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With