I have the following dataframe using pandas
df = pd.DataFrame({'Last_Name': ['Smith', None, 'Brown'],
'Date0': ['01/01/1999','01/06/1999','01/01/1979'], 'Age0': [29,44,21],
'Date1': ['08/01/1999','07/01/2014','01/01/2016'],'Age1': [35, 45, 47],
'Date2': [None,'01/06/2035','08/01/1979'],'Age2': [47, None, 74],
'Last_age': [47,45,74]})
I would like to add new column to get the date corresponding to the value presents in 'Last_age'
for each row to get something like that :
df = pd.DataFrame({'Last_Name': ['Smith', None, 'Brown'],
'Date0': ['01/01/1999','01/06/1999','01/01/1979'], 'Age0': [29,44,21],
'Date1': ['08/01/1999','07/01/2014','01/01/2016'],'Age1': [35, 45, 47],
'Date2': [None,'01/06/2035','08/01/1979'],'Age2': [47, None, 74],
'Last_age': [47,45,74],
'Last_age_date': ['Error no date','07/01/2014','08/01/1979']})
Example # 03: Extract the Dataframe Column Index Using the get_loc() Function. We have seen how to retrieve the values of a dataframe's row indexes. However, we can also retrieve the values of a dataframe's column indexes. To get the index value of any dataframe's column, the get loc() function can be used.
You can get the column index from the column name in Pandas using DataFrame. columns. get_loc() method.
I will just using wide_to_long
reshape your df
s=pd.wide_to_long(df.reset_index(),['Date','Age'],i=['Last_age','index'],j='Drop')
s.loc[s.Age==s.index.get_level_values(0),'Date']
Out[199]:
Last_age index Drop
47 0 2 None
45 1 1 07/01/2014
74 2 2 08/01/1979
Name: Date, dtype: object
df['Last_age_date']=s.loc[s.Age==s.index.get_level_values(0),'Date'].values
df
Out[201]:
Last_Name Date0 Age0 ... Age2 Last_age Last_age_date
0 Smith 01/01/1999 29 ... 47.0 47 None
1 None 01/06/1999 44 ... NaN 45 07/01/2014
2 Brown 01/01/1979 21 ... 74.0 74 08/01/1979
[3 rows x 9 columns]
Something like this should do what you are looking for:
# get the age and column rows (you might have more than just the 2)
age_columns = [c for c in df.columns if 'Age' in c][::-1]
date_columns = [c for c in df.columns if 'Date' in c][::-1]
def get_last_age_date(row):
for age, date in zip(age_columns, date_columns):
if not np.isnan(row[age]):
return row[date]
return np.nan
# apply the function to all the rows in the dataframe
df['Last_age_date'] = df.apply(lambda row: get_last_age_date(row), axis=1)
# fix the NaN values to say 'Error no date'
df.Last_age_date.where(~df.Last_age_date.isna(), 'Error no date', inplace=True)
print(df)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With