One column has dates but the other has a string containing a date, so I first need to extract the date part from that string.
import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta
# the dataframe - id column always starts with year, month and day
df = pd.DataFrame({'id': ['19520630F8', '19680321A5', '19711113E2'],
                   'dte': ['2010-06-02', '2007-08-12', '2013-01-23']})
# create a date string from df['id'] to the format yyyy-mm-dd
dob = (df['id'].str[:4] + '-' +
       df['id'].str[4:6] + '-' +
       df['id'].str[6:8])
# calculate age (years only) at df['dte']
df['age'] = relativedelta(date, dob).years
I get the error message:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I don't understand the ambiguousness of my data, and where to apply those empty/bool/item...
The df['dta'] column if of object data type and not datetime, but wrapping the creation of dob in pd.to_datetime wont help.
EDIT The expected output should be
          dte          id  age
0  2010-06-02  19520630F8   57
1  2007-08-12  19680321A5   39
2  2013-01-23  19711113E2   41
                There are several ways to calculate the time difference between two dates in Python using Pandas. The first is to subtract one date from the other. This returns a timedelta such as 0 days 05:00:00 that tells us the number of days, hours, minutes, and seconds between the two dates.
First, we use strptime function to identify the given date format into the date, month, and year. Then we use today function to get today's date. To get age we subtract the birth year from the current year.
Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.
To find the difference between dates, use the DATEDIFF(datepart, startdate, enddate) function. The datepart argument defines the part of the date/datetime in which you'd like to express the difference. Its value can be year , quarter , month , day , minute , etc.
I believe need:
df['age'] = (np.floor((pd.to_datetime(df['dte']) - 
             pd.to_datetime(dob)).dt.days / 365.25)).astype(int)
print (df)
           id         dte  age
0  19520630F8  2010-06-02   57
1  19680321A5  2007-08-12   39
2  19711113E2  2013-01-23   41
Details:
Convert columns to datetimes and subtract:
print (pd.to_datetime(df['dte']) -  pd.to_datetime(dob))
0   21156 days
1   14388 days
2   15047 days
dtype: timedelta64[ns]
Convert to days and then to years:
print ((pd.to_datetime(df['dte']) -  pd.to_datetime(dob)).dt.days / 365.25)
0    57.921971
1    39.392197
2    41.196441
dtype: float64
Last floor values by numpy.floor.:
print ((np.floor((pd.to_datetime(df['dte']) - pd.to_datetime(dob)).dt.days / 365.25)))
0    57.0
1    39.0
2    41.0
dtype: float64
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With