I have a data frame that looks like this:
I want to fill the NaNs by continuing from the max value for that year (i.e. increase incrementally based on the max value for each year).
This is what I'm trying to achieve:
The only way I know how to apply something like this to each year separately is by creating separate data frames for each year in a for loop, then appending them back together.
#data
d = {'year': {0: 2016,
1: 2016,
2: 2016,
3: 2016,
4: 2017,
5: 2017,
6: 2017,
7: 2017,
8: 2018,
9: 2018,
10: 2018},
'id': {0: 1015.0,
1: 1016.0,
2: nan,
3: nan,
4: 1035.0,
5: 1036.0,
6: nan,
7: nan,
8: 1005.0,
9: nan,
10: nan}}
# list of years
years = [2016,2017,2018]
# create dataframe
df = pd.DataFrame(d)
# create list that I will append data frames too
l = []
for x in years:
# create a dataframe for each year
df1 = df[df['year']==x].copy()
# fill nans with max value plus 1
df1['id'] = df1['id'].fillna(lambda x: x['id'].max() + 1)
# add dataframe to list
l.append(df1)
# concat list of dataframes
final = pd.concat(l)
This replaces the nans with the following text:
function at 0x000002201F43CB70
I also tried using this in my for loop:
df1['id'] = df1['id'].apply(lambda x: x['id'].fillna(x['id'].max() +1))
But I get an error:
TypeError: 'float' object is not subscriptable
You might use df.iterrows()
to go through rows and df.loc[]
to set missing 'id' values:
for index, row in df.iterrows():
if row['id'] > 0 : continue
df.loc[index,"id"] = df[df['year']==row['year']]['id'].max() +1
EDIT
A nicer way to check if row['id'] is not null would be:
if pd.notnull(row['id']): ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With