How to calculate number of years between two dates in different pandas columns

Tags:

One column has dates but the other has a string containing a date, so I first need to extract the date part from that string.

import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta

# the dataframe - id column always starts with year, month and day
df = pd.DataFrame({'id': ['19520630F8', '19680321A5', '19711113E2'],
                   'dte': ['2010-06-02', '2007-08-12', '2013-01-23']})

# create a date string from df['id'] to the format yyyy-mm-dd
dob = (df['id'].str[:4] + '-' +
       df['id'].str[4:6] + '-' +
       df['id'].str[6:8])

# calculate age (years only) at df['dte']
df['age'] = relativedelta(date, dob).years

I get the error message:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I don't understand the ambiguousness of my data, and where to apply those empty/bool/item... The df['dta'] column if of object data type and not datetime, but wrapping the creation of dob in pd.to_datetime wont help.

EDIT The expected output should be

          dte          id  age
0  2010-06-02  19520630F8   57
1  2007-08-12  19680321A5   39
2  2013-01-23  19711113E2   41

861

asked Jun 12 '18 11:06

leofer

1 Answers

I believe need:

df['age'] = (np.floor((pd.to_datetime(df['dte']) - 
             pd.to_datetime(dob)).dt.days / 365.25)).astype(int)
print (df)
           id         dte  age
0  19520630F8  2010-06-02   57
1  19680321A5  2007-08-12   39
2  19711113E2  2013-01-23   41

Details:

Convert columns to datetimes and subtract:

print (pd.to_datetime(df['dte']) -  pd.to_datetime(dob))
0   21156 days
1   14388 days
2   15047 days
dtype: timedelta64[ns]

Convert to days and then to years:

print ((pd.to_datetime(df['dte']) -  pd.to_datetime(dob)).dt.days / 365.25)
0    57.921971
1    39.392197
2    41.196441
dtype: float64

Last floor values by numpy.floor.:

print ((np.floor((pd.to_datetime(df['dte']) - pd.to_datetime(dob)).dt.days / 365.25)))
0    57.0
1    39.0
2    41.0
dtype: float64

111

answered Oct 28 '22 04:10

jezrael

Related questions
                            
                                Tensorflow error : unsupported callable
                            
                                Access file in external hard drive using python on mac
                            
                                Converting a dictionary of dictionaries to a List of dictionaries
                            
                                How to get endianness of numpy dtype
                            
                                Cloudwatch event is not triggering my lambda function, even though it's a target
                            
                                Django : Use multiple CSS file in one html
                            
                                How can I know the type of a pandas dataframe cell
                            
                                Odoo docker image: how to scaffold?
                            
                                Python Django manage.py runserver too many values to unpack Passing a 3-tuple to include() is not supported
                            
                                Pandas, print variable in string
                            
                                Drop duplicates based on majority rule
                            
                                How do I reset a list iterator in Python?
                            
                                How to change protocol to https on wagtail sitemaps?
                            
                                Append rows to groups in pandas
                            
                                Using next() on generator function
                            
                                assert self._state in (CLOSE, TERMINATE) when using python multiprocess
                            
                                Beautiful Soup Find Tags based on partial attribute value
                            
                                `shutil.rmtree` does not work on `tempfile.TemporaryDirectory()`
                            
                                Replace the year in pandas.datetime column
                            
                                Serialize model fields into nested object/dict

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to calculate number of years between two dates in different pandas columns

Tags:

python

datetime

pandas

dataframe

leofer

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us