Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate number of years between two dates in different pandas columns

One column has dates but the other has a string containing a date, so I first need to extract the date part from that string.

import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta

# the dataframe - id column always starts with year, month and day
df = pd.DataFrame({'id': ['19520630F8', '19680321A5', '19711113E2'],
                   'dte': ['2010-06-02', '2007-08-12', '2013-01-23']})

# create a date string from df['id'] to the format yyyy-mm-dd
dob = (df['id'].str[:4] + '-' +
       df['id'].str[4:6] + '-' +
       df['id'].str[6:8])

# calculate age (years only) at df['dte']
df['age'] = relativedelta(date, dob).years

I get the error message:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I don't understand the ambiguousness of my data, and where to apply those empty/bool/item... The df['dta'] column if of object data type and not datetime, but wrapping the creation of dob in pd.to_datetime wont help.

EDIT The expected output should be

          dte          id  age
0  2010-06-02  19520630F8   57
1  2007-08-12  19680321A5   39
2  2013-01-23  19711113E2   41
like image 861
leofer Avatar asked Jun 12 '18 11:06

leofer


People also ask

How do I calculate time difference between two columns in Pandas?

There are several ways to calculate the time difference between two dates in Python using Pandas. The first is to subtract one date from the other. This returns a timedelta such as 0 days 05:00:00 that tells us the number of days, hours, minutes, and seconds between the two dates.

How do you calculate age between two dates in Pandas?

First, we use strptime function to identify the given date format into the date, month, and year. Then we use today function to get today's date. To get age we subtract the birth year from the current year.

How do I count the number of occurrences in a column in Pandas?

Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.

How do I find the difference between two date columns?

To find the difference between dates, use the DATEDIFF(datepart, startdate, enddate) function. The datepart argument defines the part of the date/datetime in which you'd like to express the difference. Its value can be year , quarter , month , day , minute , etc.


1 Answers

I believe need:

df['age'] = (np.floor((pd.to_datetime(df['dte']) - 
             pd.to_datetime(dob)).dt.days / 365.25)).astype(int)
print (df)
           id         dte  age
0  19520630F8  2010-06-02   57
1  19680321A5  2007-08-12   39
2  19711113E2  2013-01-23   41

Details:

Convert columns to datetimes and subtract:

print (pd.to_datetime(df['dte']) -  pd.to_datetime(dob))
0   21156 days
1   14388 days
2   15047 days
dtype: timedelta64[ns]

Convert to days and then to years:

print ((pd.to_datetime(df['dte']) -  pd.to_datetime(dob)).dt.days / 365.25)
0    57.921971
1    39.392197
2    41.196441
dtype: float64

Last floor values by numpy.floor.:

print ((np.floor((pd.to_datetime(df['dte']) - pd.to_datetime(dob)).dt.days / 365.25)))
0    57.0
1    39.0
2    41.0
dtype: float64
like image 111
jezrael Avatar answered Oct 28 '22 04:10

jezrael