Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

replace nans with max value plus 1 incrementally

Tags:

python

I have a data frame that looks like this:

enter image description here

I want to fill the NaNs by continuing from the max value for that year (i.e. increase incrementally based on the max value for each year).

This is what I'm trying to achieve:

enter image description here

The only way I know how to apply something like this to each year separately is by creating separate data frames for each year in a for loop, then appending them back together.

#data
d = {'year': {0: 2016,
  1: 2016,
  2: 2016,
  3: 2016,
  4: 2017,
  5: 2017,
  6: 2017,
  7: 2017,
  8: 2018,
  9: 2018,
  10: 2018},
 'id': {0: 1015.0,
  1: 1016.0,
  2: nan,
  3: nan,
  4: 1035.0,
  5: 1036.0,
  6: nan,
  7: nan,
  8: 1005.0,
  9: nan,
  10: nan}}

# list of years
years = [2016,2017,2018]

# create dataframe    
df = pd.DataFrame(d)

# create list that I will append data frames too
l = []

for x in years:
    # create a dataframe for each year
    df1 = df[df['year']==x].copy()
    # fill nans with max value plus 1
    df1['id'] = df1['id'].fillna(lambda x: x['id'].max() + 1)
    # add dataframe to list
    l.append(df1)
# concat list of dataframes
final = pd.concat(l)

This replaces the nans with the following text:

function at 0x000002201F43CB70

I also tried using this in my for loop:

df1['id'] = df1['id'].apply(lambda x: x['id'].fillna(x['id'].max() +1))

But I get an error:

TypeError: 'float' object is not subscriptable
like image 312
Dread Avatar asked Oct 16 '22 14:10

Dread


1 Answers

You might use df.iterrows() to go through rows and df.loc[] to set missing 'id' values:

for index, row in df.iterrows():
    if row['id'] > 0 : continue
    df.loc[index,"id"] = df[df['year']==row['year']]['id'].max() +1

EDIT

A nicer way to check if row['id'] is not null would be:

    if pd.notnull(row['id']): ...
like image 55
Sebastien D Avatar answered Oct 20 '22 16:10

Sebastien D