Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse a Pandas column to Datetime when importing table from SQL database and filtering rows by date

I have a DataFrame with column named date. How can we convert/parse the 'date' column to a DateTime object?

I loaded the date column from a Postgresql database using sql.read_frame(). An example of the date column is 2013-04-04.

What I am trying to do is to select all rows in a dataframe that has their date columns within a certain period, like after 2013-04-01 and before 2013-04-04.

My attempt below gives the error 'Series' object has no attribute 'read'

Attempt

import dateutil  df['date'] = dateutil.parser.parse(df['date']) 

Error

AttributeError                            Traceback (most recent call last) <ipython-input-636-9b19aa5f989c> in <module>()      15       16 # Parse 'Date' Column to Datetime ---> 17 df['date'] = dateutil.parser.parse(df['date'])      18       19 # SELECT RECENT SALES  C:\Python27\lib\site-packages\dateutil\parser.pyc in parse(timestr, parserinfo, **kwargs)     695         return parser(parserinfo).parse(timestr, **kwargs)     696     else: --> 697         return DEFAULTPARSER.parse(timestr, **kwargs)     698      699   C:\Python27\lib\site-packages\dateutil\parser.pyc in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)     299             default = datetime.datetime.now().replace(hour=0, minute=0,     300                                                       second=0, microsecond=0) --> 301         res = self._parse(timestr, **kwargs)     302         if res is None:     303             raise ValueError, "unknown string format"  C:\Python27\lib\site-packages\dateutil\parser.pyc in _parse(self, timestr, dayfirst, yearfirst, fuzzy)     347             yearfirst = info.yearfirst     348         res = self._result() --> 349         l = _timelex.split(timestr)     350         try:     351   C:\Python27\lib\site-packages\dateutil\parser.pyc in split(cls, s)     141      142     def split(cls, s): --> 143         return list(cls(s))     144     split = classmethod(split)     145   C:\Python27\lib\site-packages\dateutil\parser.pyc in next(self)     135      136     def next(self): --> 137         token = self.get_token()     138         if token is None:     139             raise StopIteration  C:\Python27\lib\site-packages\dateutil\parser.pyc in get_token(self)      66                 nextchar = self.charstack.pop(0)      67             else: ---> 68                 nextchar = self.instream.read(1)      69                 while nextchar == '\x00':      70                     nextchar = self.instream.read(1)  AttributeError: 'Series' object has no attribute 'read' 

df['date'].apply(dateutil.parser.parse) gives me the error AttributeError: 'datetime.date' object has no attribute 'read'

df['date'].truncate(after='2013/04/01') gives the error TypeError: can't compare datetime.datetime to long

df['date'].dtype returns dtype('O'). Is it already a datetime object?

like image 537
Nyxynyx Avatar asked May 07 '13 05:05

Nyxynyx


1 Answers

Pandas is aware of the object datetime but when you use some of the import functions it is taken as a string. So what you need to do is make sure the column is set as the datetime type not as a string. Then you can make your query.

df['date']  = pd.to_datetime(df['date']) df_masked = df[(df['date'] > datetime.date(2012,4,1)) & (df['date'] < datetime.date(2012,4,4))] 
like image 105
Keith Avatar answered Sep 21 '22 23:09

Keith