Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas Series failure datetime

I think that this has to be a failure of pandas, having a pandas Series (v.18.1 and 19 too), if I assign a date to the Series, the first time it is added as int (error), the second time it is added as datetime(correct), I can not understand the reason.

For instance with this code:

import datetime as dt
import pandas as pd
series = pd.Series(list('abc'))
date = dt.datetime(2016, 10, 30, 0, 0)
series["Date_column"] =date
print("The date is {} and the type is {}".format(series["Date_column"], type(series["Date_column"])))
series["Date_column"] =date
print("The date is {} and the type is {}".format(series["Date_column"], type(series["Date_column"])))

The output is:

The date is 1477785600000000000 and the type is <class 'int'>
The date is 2016-10-30 00:00:00 and the type is <class 'datetime.datetime'>

As you can see, the first time it always sets the value as int instead of datetime.

could someone help me?, Thank you very much in advance, Javi.

like image 332
bracana Avatar asked Nov 21 '16 09:11

bracana


1 Answers

The reason for this is that series is an 'object' type and the columns of a pandas DataFrame (or a Series) are homogeneously of type. You can inspect this with dtype (or DataFrame.dtypes):

series = pd.Series(list('abc'))
series
Out[3]:
0    a
1    b
2    c
dtype: object

In [15]: date = dt.datetime(2016, 10, 30, 0, 0)
date
Out[15]: datetime.datetime(2016, 10, 30, 0, 0)

In [18]: print(date)
2016-10-30 00:00:00

In [17]: type(date)
Out[17]: datetime.datetime

In [19]: series["Date_column"] = date
In [20]: series

Out[20]:
0                                a
1                                b
2                                c
Date_column    1477785600000000000
dtype: object

In [22]: series.dtype

Out[22]: dtype('O')

Only the generic 'object' dtype can hold any python object (in your case inserting a datetime.datetime object into the Series).

Moreover, Pandas Series are based on Numpy Arrays, which are not mixed types and defeats the purpose of using the computational benefit of Pandas DataFrames and Series or Numpy.

Could you use a python list() instead? or a DataFrame()?

like image 200
ratchet Avatar answered Sep 29 '22 13:09

ratchet