One column has dates but the other has a string containing a date, so I first need to extract the date part from that string.
import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta
# the dataframe - id column always starts with year, month and day
df = pd.DataFrame({'id': ['19520630F8', '19680321A5', '19711113E2'],
'dte': ['2010-06-02', '2007-08-12', '2013-01-23']})
# create a date string from df['id'] to the format yyyy-mm-dd
dob = (df['id'].str[:4] + '-' +
df['id'].str[4:6] + '-' +
df['id'].str[6:8])
# calculate age (years only) at df['dte']
df['age'] = relativedelta(date, dob).years
I get the error message:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I don't understand the ambiguousness of my data, and where to apply those empty/bool/item...
The df['dta']
column if of object data type and not datetime, but wrapping the creation of dob in pd.to_datetime
wont help.
EDIT The expected output should be
dte id age
0 2010-06-02 19520630F8 57
1 2007-08-12 19680321A5 39
2 2013-01-23 19711113E2 41
There are several ways to calculate the time difference between two dates in Python using Pandas. The first is to subtract one date from the other. This returns a timedelta such as 0 days 05:00:00 that tells us the number of days, hours, minutes, and seconds between the two dates.
First, we use strptime function to identify the given date format into the date, month, and year. Then we use today function to get today's date. To get age we subtract the birth year from the current year.
Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.
To find the difference between dates, use the DATEDIFF(datepart, startdate, enddate) function. The datepart argument defines the part of the date/datetime in which you'd like to express the difference. Its value can be year , quarter , month , day , minute , etc.
I believe need:
df['age'] = (np.floor((pd.to_datetime(df['dte']) -
pd.to_datetime(dob)).dt.days / 365.25)).astype(int)
print (df)
id dte age
0 19520630F8 2010-06-02 57
1 19680321A5 2007-08-12 39
2 19711113E2 2013-01-23 41
Details:
Convert columns to datetimes and subtract:
print (pd.to_datetime(df['dte']) - pd.to_datetime(dob))
0 21156 days
1 14388 days
2 15047 days
dtype: timedelta64[ns]
Convert to days and then to years:
print ((pd.to_datetime(df['dte']) - pd.to_datetime(dob)).dt.days / 365.25)
0 57.921971
1 39.392197
2 41.196441
dtype: float64
Last floor
values by numpy.floor.
:
print ((np.floor((pd.to_datetime(df['dte']) - pd.to_datetime(dob)).dt.days / 365.25)))
0 57.0
1 39.0
2 41.0
dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With