Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

convert days difference to numeric values in python pandas

I have a column 'datedif' in my dataframe as :

  exposuredate min_exposure_date    datedif
  2014-10-08   2014-09-27           11 days
  2014-10-09   2014-09-27           12 days
  2014-09-27   2014-09-27           0 days
  2014-09-27   2014-09-27           0 days
  2014-10-22   2014-09-27           25 days

  data.exposuredate = pd.to_datetime(data.exposuredate)
  data.min_exposure_date = pd.to_datetime(data.min_exposure_date)

  data['datedif'] = ((data.exposuredate)-(data.min_exposure_date))

The format for the columns are datetime64[ns]. I want to extract the number of days in the field 'datedif'. I am not able to find anything which could help me to extract the difference in terms of number of days.

I tried:

data['datedif_day'] = data['datedif'].dt.days

Error:

AttributeError: 'Series' object has no attribute 'dt'

like image 281
Pulkit Jha Avatar asked Feb 28 '26 23:02

Pulkit Jha


2 Answers

The Pandas Docs relate to the types of conversions you are looking for as Frequency Conversion

The two options are 1) division by Timedelta or 2) type conversion. There is a subtle difference between the two as stated in the docs:

"Note that division by the numpy scalar is true division, while astyping is equivalent of floor division."

data = pd.DataFrame([("2014-10-08", "2014-09-27"),
                     ("2014-10-09", "2014-09-27"),
                     ("2014-09-27", "2014-09-27"),
                     ("2014-10-22", "2014-09-27")],
                    columns=["exposuredate", "min_exposure_date"])

data['datediff'] =   pd.to_datetime(data.exposuredate) 
                   - pd.to_datetime(data.min_exposure_date)

data['datediff'] / pd.Timedelta(1, unit='d')
data['datediff'].astype('timedelta64[D]')

Both operations yield:

0    11.0
1    12.0
2     0.0
3    25.0
Name: datediff, dtype: float64

If you are using the date difference as a feature for training a machine learning algorithm, it doesn't matter in which form they are represented as they should be normalised anyway. timedelta64[ns] is a perfectly fine for that. When it comes to visualisation purposes, see this post.

like image 144
Hendrik F Avatar answered Mar 02 '26 12:03

Hendrik F


The 'datedif' looks in days format but actually it is in seconds. So in order to get number of days for furhter use add the following line in the code:

   data['datedif'] = data['datedif'].astype(np.numpy64)
   data['datedif_day'] = (data['datedif']/86400000000000)
like image 22
Pulkit Jha Avatar answered Mar 02 '26 12:03

Pulkit Jha



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!