Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill zeroes between non zero values, leave other zeroes be

I need to fill zeroes in dataframe columns as said in title, I can do it with iterrows() or itertuples() (similar execution time) with some conditionals but I hope there is a faster way.

There are some consecutive, identical integers that sometimes have one or two zeroes between them. Those are the zeroes I need to fill with the integers they separate. All the other zeroes (that are not between the non-zero ints, so you can also say that those that are more than two in a row) remain as zeroes.

x = [[0,0,0,0,0,2,2,2,0,2,2,0,0,0,0,0,0,0,0,1,1,1,0,0,1,1,0,0,0,0],
     [0,0,0,0,3,3,0,0,3,3,3,3,0,0,0,0,0,2,2,2,0,2,2,0,0,0,0,0,0,0],
     [0,0,0,0,0,0,0,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,0,0,0,0,0,0,0]]
df = pd.DataFrame.from_records(x).T
df.columns = ['x', 'y', 'z']

    x   y   z
0   0   0   0
1   0   0   0
2   0   0   0
3   0   0   0
4   0   3   0
5   2   3   0
6   2   0   0
7   2   0   0
8   0   3   0
9   2   3   1
10  2   3   1
11  0   3   1
12  0   0   0
13  0   0   0
14  0   0   1
15  0   0   1
16  0   0   1
17  0   2   0
18  0   2   1
19  1   2   1
20  1   0   1
21  1   2   0
22  0   2   0
23  0   0   0
24  1   0   0
25  1   0   0
26  0   0   0
27  0   0   0
28  0   0   0
29  0   0   0

The desired output would be:

    x   y   z
0   0   0   0
1   0   0   0
2   0   0   0
3   0   0   0
4   0   3   0
5   2   3   0
6   2   3   0
7   2   3   0
8   2   3   0
9   2   3   1
10  2   3   1
11  0   3   1
12  0   0   1
13  0   0   1
14  0   0   1
15  0   0   1
16  0   0   1
17  0   2   1
18  0   2   1
19  1   2   1
20  1   2   1
21  1   2   0
22  1   2   0
23  1   0   0
24  1   0   0
25  1   0   0
26  0   0   0
27  0   0   0
28  0   0   0
29  0   0   0
like image 524
Dr Dro Avatar asked Mar 02 '23 14:03

Dr Dro


1 Answers

You can first replace 0 with np.nan, the ffill and bfill and compare if they are equal, then keep the ffilled df and assign 0 to others:

u = df.replace(0,np.nan)
a = u.ffill()
b = u.bfill()
yourout = a.where(a==b,0).astype(df.dtypes)

print(yourout)

    x  y  z
0   0  0  0
1   0  0  0
2   0  0  0
3   0  0  0
4   0  3  0
5   2  3  0
6   2  3  0
7   2  3  0
8   2  3  0
9   2  3  1
10  2  3  1
11  0  3  1
12  0  0  1
13  0  0  1
14  0  0  1
15  0  0  1
16  0  0  1
17  0  2  1
18  0  2  1
19  1  2  1
20  1  2  1
21  1  2  0
22  1  2  0
23  1  0  0
24  1  0  0
25  1  0  0
26  0  0  0
27  0  0  0
28  0  0  0
29  0  0  0
like image 168
anky Avatar answered Apr 06 '23 23:04

anky