Column arithmetic in pandas dataframe using dates

Tags:

I think this should be easy but I'm hitting a bit of a wall. I have a dataset that was imported into a pandas dataframe from a Stata .dta file. Several of the columns contain date data. The dataframe contains 100,000+ rows but a sample is given:

Click to copy

   cat  event_date  total
0   G2  2006-03-08     16
1   G2         NaT    NaN
2   G2         NaT    NaN
3   G3  2006-03-10     16
4   G3  2006-08-04     12
5   G3  2006-12-28     13
6   G3  2007-05-25     10
7   G4  2006-03-10     13
8   G4  2006-08-06     19
9   G4  2006-12-30     16

The data is stored as a datetime64 format:

Click to copy

>>> mydata[['cat','event_date','total']].dtypes
cat                    object
event_date     datetime64[ns]
total                 float64
dtype: object

All I would like to do is create a new column which gives the difference in days (rather than 'us' or 'ns'!!!) between the event_date and a start date, say 2006-01-01. I've tried the following:

Click to copy

>>> mydata['new'] = mydata['event_date'] - np.datetime64('2006-01-01')

… but I get the message:

Click to copy

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

I've also tried a lambda function but that doesn't work either.

However, if I wanted to simply add on one day to each date I can successfully use:

Click to copy

>>> mydata['plusone'] = mydata['event_date'] + np.timedelta64(1,'D')

That works fine.

Am I missing something straightforward here?

Thanks in advance for any help.

498

asked Aug 12 '14 02:08

user1718097

2 Answers

Not sure why the numpy datetime64 is incompatible with pandas dtypes but using datetime objects worked fine for me:

Click to copy

In [39]:

import datetime as dt
mydata['new'] = mydata['event_date'] - dt.datetime(2006,1,1)
mydata
Out[39]:
      cat event_date  total      new
Index                               
0      G2 2006-03-08     16  66 days
1      G2        NaT    NaN      NaT
2      G2        NaT    NaN      NaT
3      G3 2006-03-10     16  68 days
4      G3 2006-08-04     12 215 days
5      G3 2006-12-28     13 361 days
6      G3 2007-05-25     10 509 days
7      G4 2006-03-10     13  68 days
8      G4 2006-08-06     19 217 days
9      G4 2006-12-30     16 363 days

164

answered Oct 06 '22 01:10

EdChum

Ensure you have an upto date version of pandas and numpy (>=1.7):

Click to copy

In [11]: df.event_date - pd.Timestamp('2006-01-01')
Out[11]:
0    66 days
1        NaT
2        NaT
3    68 days
4   215 days
5   361 days
6   509 days
7    68 days
8   217 days
9   363 days
Name: event_date, dtype: timedelta64[ns]

answered Oct 06 '22 02:10

Andy Hayden

Related questions
                            
                                Serialize C++ object to send via sockets to Python - best approach?
                            
                                GeoAlchemy ST_DWithin implementation
                            
                                Python: How to get input from console while an infinite loop is running?
                            
                                matplotlib xtick labels not aligned
                            
                                Passing variables through URL to a flask app
                            
                                Efficient way to find if a short string is present in a longer string in python
                            
                                How to run this MongoDB query using MongoEngine
                            
                                Concise way to print dictionary keys in a nice format [duplicate]
                            
                                python-how to crawl past __VIEWSTATE
                            
                                How to run nosetests without showing of my matplotlib's graph?
                            
                                Flask: share sessions between domain.com and username.domain.com
                            
                                Why wouldn't I want to add Python.exe to my System Path at install time?
                            
                                Numpy: how to roll 1 "row" in an array of arrays
                            
                                Django template: Translate include with variable
                            
                                Unit Tests pass against regex validator of models in Django
                            
                                Django Createview for generic Model
                            
                                Is it ok to use Pika BlockingConnection in web app?
                            
                                Python : IndexError: tuple index out of range
                            
                                Error : no viable alternative at input 'for' Python
                            
                                Python confusing function reference

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Column arithmetic in pandas dataframe using dates

Tags:

python

pandas

dataframe

datetime64

user1718097

People also ask

2 Answers

EdChum

Andy Hayden

Recent Activity

Donate For Us