may i ask for suggestion to iterate more efficiently (and faster)? here is the issue, I am looking for a way to propagate zeros down in a pandas DataFrame within a determined window size:
import numpy as np
import pandas as pd
A = np.matrix([[ 0., 1., 1., 1., 1.],
[ 1., 0., 1., 1., 1.],
[ 1., 1., 0., 1., 1.],
[ 1., 1., 1., 0., 1.],
[ 1., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 0.],
[ 1., 1., 0., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 0., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 0., 1., 1.],
[ 1., 1., 1., 1., 0.],
[ 1., 0., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]])
df = pd.DataFrame(A)
now we want to fill by increment of windows of 3 lines each the values from the top. each window of 3 lines starts on window_start, defined as:
window_size = 3
window_start = [i for i in range(0, df.shape[0])
if i % window_size == 0]
print(df)
gf = df.copy()
print('\n')
now making the DataFrame where the zeros propagate from the preceding lines above inside that window:
for i in window_start:
for j in range(1, window_size):
try: gf.iloc[i + j] = gf.iloc[i + j - 1] * gf.iloc[i + j]
except: pass
print(gf)
This last bit is pretty inefficient and time consuming for very large datasets, is there a better way to do it?
You should be able to accomplish this task with a cumulative product within a groupby
.
df.groupby(np.arange(len(df)) // 3).cumprod()
0 1 2 3 4
0 0.0 1.0 1.0 1.0 1.0
1 0.0 0.0 1.0 1.0 1.0
2 0.0 0.0 0.0 1.0 1.0
3 1.0 1.0 1.0 0.0 1.0
4 1.0 1.0 1.0 0.0 0.0
5 1.0 1.0 1.0 0.0 0.0
6 1.0 1.0 1.0 1.0 1.0
7 1.0 1.0 1.0 1.0 1.0
8 1.0 1.0 1.0 1.0 0.0
9 1.0 1.0 0.0 1.0 1.0
10 1.0 1.0 0.0 1.0 1.0
11 1.0 1.0 0.0 1.0 1.0
12 1.0 1.0 1.0 1.0 1.0
13 1.0 1.0 1.0 1.0 1.0
14 1.0 1.0 1.0 0.0 1.0
15 1.0 1.0 1.0 1.0 1.0
16 1.0 1.0 0.0 1.0 1.0
17 1.0 1.0 0.0 1.0 0.0
18 1.0 0.0 1.0 1.0 1.0
19 1.0 0.0 1.0 1.0 1.0
We can take a better look, by using concat
to see if it's doing what we want.
pd.concat([df.iloc[:6, :2], d1.iloc[:6, :2]], axis=1, keys=['Before', 'After'])
Before After
0 1 0 1
0 0.0 1.0 0.0 1.0
1 1.0 0.0 0.0 0.0
2 1.0 1.0 0.0 0.0
3 1.0 1.0 1.0 1.0
4 1.0 1.0 1.0 1.0
5 1.0 1.0 1.0 1.0
My Take on a numpy
approach
See @Divakar's solution as I borrowed some elements of his function
def prop_zero(df, window_size=3):
a = df.values
W = window_size
m, n = a.shape
pad = np.zeros((W - m % W, n))
b = np.vstack([a, pad])
return pd.DataFrame(
b.reshape(-1, W, n).cumprod(1).reshape(-1, n)[:m],
df.index, df.columns
)
prop_zero(df)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With