Storing pure python datetime.datetime in pandas DataFrame

Tags:

Since matplotlib doesn't support eitherpandas.TimeStamp ornumpy.datetime64, and there are no simple workarounds, I decided to convert a native pandas date column into a pure python datetime.datetime so that scatter plots are easier to make.

However:

t = pd.DataFrame({'date': [pd.to_datetime('2012-12-31')]})
t.dtypes # date    datetime64[ns], as expected
pure_python_datetime_array = t.date.dt.to_pydatetime() # works fine
t['date'] = pure_python_datetime_array # doesn't do what I hoped
t.dtypes # date    datetime64[ns] as before, no luck changing it

I'm guessing pandas auto-converts the pure python datetime produced by to_pydatetime into its native format. I guess it's convenient behavior in general, but is there a way to override it?

965

asked Sep 01 '16 17:09

max

2 Answers

The use of to_pydatetime() is correct.

In [87]: t = pd.DataFrame({'date': [pd.to_datetime('2012-12-31'), pd.to_datetime('2013-12-31')]})

In [88]: t.date.dt.to_pydatetime()
Out[88]: 
array([datetime.datetime(2012, 12, 31, 0, 0),
       datetime.datetime(2013, 12, 31, 0, 0)], dtype=object)

When you assign it back to t.date, it automatically converts it back to datetime64

pandas.Timestamp is a datetime subclass anyway :)

One way to do the plot is to convert the datetime to int64:

In [117]: t = pd.DataFrame({'date': [pd.to_datetime('2012-12-31'), pd.to_datetime('2013-12-31')], 'sample_data': [1, 2]})

In [118]: t['date_int'] = t.date.astype(np.int64)

In [119]: t
Out[119]: 
        date  sample_data             date_int
0 2012-12-31            1  1356912000000000000
1 2013-12-31            2  1388448000000000000

In [120]: t.plot(kind='scatter', x='date_int', y='sample_data')
Out[120]: <matplotlib.axes._subplots.AxesSubplot at 0x7f3c852662d0>

In [121]: plt.show()

enter image description here

Another workaround is (to not use scatter, but ...):

In [126]: t.plot(x='date', y='sample_data', style='.')
Out[126]: <matplotlib.axes._subplots.AxesSubplot at 0x7f3c850f5750>

And, the last work around:

In [141]: import matplotlib.pyplot as plt

In [142]: t = pd.DataFrame({'date': [pd.to_datetime('2012-12-31'), pd.to_datetime('2013-12-31')], 'sample_data': [100, 20000]})

In [143]: t
Out[143]: 
        date  sample_data
0 2012-12-31          100
1 2013-12-31        20000
In [144]: plt.scatter(t.date.dt.to_pydatetime()  , t.sample_data)
Out[144]: <matplotlib.collections.PathCollection at 0x7f3c84a10510>

In [145]: plt.show()

enter image description here

This has an issue at github, which is open as of now.

161

answered Oct 02 '22 13:10

Nehal J Wani

Here is a possible solution with the Series class from pandas:

t = pd.DataFrame({'date': [pd.to_datetime('2012-12-31')]})
t.dtypes # date    datetime64[ns], as expected
pure_python_datetime_array = t.date.dt.to_pydatetime() # works fine
t['date'] = pd.Series(pure_python_datetime_array, dtype=object) # should do what you expect
t.dtypes # object, but the type of the date column is now correct! datetime
type(t.values[0, 0]) # datetime, now you can access the datetime object directly

Why is this working? My assumption is, that you force the dtype for the column date to be an object. So that pandas does not do any intern conversion from datetime.datetime to datetime64.

Correct me otherwise, if I am wrong.

answered Oct 02 '22 12:10

PiMathCLanguage

Related questions
                            
                                How to read in an edge list to make a scipy sparse matrix
                            
                                Pandas `period_range` gives strange results
                            
                                Python, create shortcut with two paths and argument
                            
                                How to clear matplotlib labels in legend?
                            
                                Save or export weights and biases in TensorFlow for non-Python replication
                            
                                Towards limiting the big RDD
                            
                                IPython: How to show the same plot in different cells?
                            
                                Is python smart enough to replace function calls with constant result?
                            
                                How to use eventlet library for async gunicorn workers
                            
                                Python+kivy+SQLite: How to use them together
                            
                                How to predict new values using statsmodels.formula.api (python)
                            
                                How to load table from SQLLite db file from PySpark?
                            
                                Pandas: Type conversion using `df.loc` from datetime64 to int
                            
                                python :: iterate through nested JSON results
                            
                                Best way to set Entry Background Color in Python GTK3 and set back to default
                            
                                Theano: Where to put .theanorc file for Anaconda installation? (Windows)
                            
                                Pig: is it possible to use pytz or dateutils for Python udfs?
                            
                                pytest-bdd reuse same method for different steps
                            
                                Debugging python-behave steps with Pycharm
                            
                                python: trouble with Popen FileNotFoundError: [WinError 2]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Storing pure python datetime.datetime in pandas DataFrame

Tags:

python

datetime

python-3.x

pandas

max

People also ask

2 Answers

Nehal J Wani

PiMathCLanguage

Recent Activity

Donate For Us