I have a pandas'DataFrame
, it looks like this:
# Output
# A B C D
# 0 3.0 6.0 7.0 4.0
# 1 42.0 44.0 1.0 3.0
# 2 4.0 2.0 3.0 62.0
# 3 90.0 83.0 53.0 23.0
# 4 22.0 23.0 24.0 NaN
# 5 5.0 2.0 5.0 34.0
# 6 NaN NaN NaN NaN
# 7 NaN NaN NaN NaN
# 8 2.0 12.0 65.0 1.0
# 9 5.0 7.0 32.0 7.0
# 10 2.0 13.0 6.0 12.0
# 11 NaN NaN NaN NaN
# 12 23.0 NaN 23.0 34.0
# 13 61.0 NaN 63.0 3.0
# 14 32.0 43.0 12.0 76.0
# 15 24.0 2.0 34.0 2.0
What I would like to do is fill the NaN's with the earliest preceding row's B
value. Apart from Column D
, on this row, I would like NaN's replaced with zeros.
I've looked into ffill, fillna.. neither seem to be able to do the job.
My solution so far:
def fix_abc(row, column, df):
# If the row/column value is null/nan
if pd.isnull( row[column] ):
# Get the value of row[column] from the row before
prior = row.name
value = df[prior-1:prior]['B'].values[0]
# If that values empty, go to the row before that
while pd.isnull( value ) and prior >= 1 :
prior = prior - 1
value = df[prior-1:prior]['B'].values[0]
else:
value = row[column]
return value
df['A'] = df.apply( lambda x: fix_abc(x,'A',df), axis=1 )
df['B'] = df.apply( lambda x: fix_abc(x,'B',df), axis=1 )
df['C'] = df.apply( lambda x: fix_abc(x,'C',df), axis=1 )
def fix_d(x):
if pd.isnull(x['D']):
return 0
return x
df['D'] = df.apply( lambda x: fix_d(x), axis=1 )
It feels like this quite inefficient, and slow. So I'm wondering if there is a quicker, more efficient way to do this.
Example output;
# A B C D
# 0 3.0 6.0 7.0 3.0
# 1 42.0 44.0 1.0 42.0
# 2 4.0 2.0 3.0 4.0
# 3 90.0 83.0 53.0 90.0
# 4 22.0 23.0 24.0 0.0
# 5 5.0 2.0 5.0 5.0
# 6 2.0 2.0 2.0 0.0
# 7 2.0 2.0 2.0 0.0
# 8 2.0 12.0 65.0 2.0
# 9 5.0 7.0 32.0 5.0
# 10 2.0 13.0 6.0 2.0
# 11 13.0 13.0 13.0 0.0
# 12 23.0 13.0 23.0 23.0
# 13 61.0 13.0 63.0 61.0
# 14 32.0 43.0 12.0 32.0
# 15 24.0 2.0 34.0 24.0
I have dumped the code including the data for the dataframe into a python fiddle available (here)
fillna
allows for various ways to do the filling. In this case, column D
can just fill with 0
. Column B
can fill via pad
. And then columns A
and C
can fill from column B
, like:
Code:
df['D'] = df.D.fillna(0)
df['B'] = df.B.fillna(method='pad')
df['A'] = df.A.fillna(df['B'])
df['C'] = df.C.fillna(df['B'])
Test Code:
df = pd.read_fwf(StringIO(u"""
A B C D
3.0 6.0 7.0 4.0
42.0 44.0 1.0 3.0
4.0 2.0 3.0 62.0
90.0 83.0 53.0 23.0
22.0 23.0 24.0 NaN
5.0 2.0 5.0 34.0
NaN NaN NaN NaN
NaN NaN NaN NaN
2.0 12.0 65.0 1.0
5.0 7.0 32.0 7.0
2.0 13.0 6.0 12.0
NaN NaN NaN NaN
23.0 NaN 23.0 34.0
61.0 NaN 63.0 3.0
32.0 43.0 12.0 76.0
24.0 2.0 34.0 2.0"""), header=1)
print(df)
df['D'] = df.D.fillna(0)
df['B'] = df.B.fillna(method='pad')
df['A'] = df.A.fillna(df['B'])
df['C'] = df.C.fillna(df['B'])
print(df)
Results:
A B C D
0 3.0 6.0 7.0 4.0
1 42.0 44.0 1.0 3.0
2 4.0 2.0 3.0 62.0
3 90.0 83.0 53.0 23.0
4 22.0 23.0 24.0 NaN
5 5.0 2.0 5.0 34.0
6 NaN NaN NaN NaN
7 NaN NaN NaN NaN
8 2.0 12.0 65.0 1.0
9 5.0 7.0 32.0 7.0
10 2.0 13.0 6.0 12.0
11 NaN NaN NaN NaN
12 23.0 NaN 23.0 34.0
13 61.0 NaN 63.0 3.0
14 32.0 43.0 12.0 76.0
15 24.0 2.0 34.0 2.0
A B C D
0 3.0 6.0 7.0 4.0
1 42.0 44.0 1.0 3.0
2 4.0 2.0 3.0 62.0
3 90.0 83.0 53.0 23.0
4 22.0 23.0 24.0 0.0
5 5.0 2.0 5.0 34.0
6 2.0 2.0 2.0 0.0
7 2.0 2.0 2.0 0.0
8 2.0 12.0 65.0 1.0
9 5.0 7.0 32.0 7.0
10 2.0 13.0 6.0 12.0
11 13.0 13.0 13.0 0.0
12 23.0 13.0 23.0 34.0
13 61.0 13.0 63.0 3.0
14 32.0 43.0 12.0 76.0
15 24.0 2.0 34.0 2.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With