Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Padding rows based on conditional

I have time series data per row (with columns as time steps) and I'd like to left and right pad each row with 0s based on a conditional row value (i.e. 'Padding amount'). This is what I have:

Padding amount     T1     T2     T3
   0               3      2.9    2.8
   1               2.9    2.8    2.7
   1               2.8    2.3    2.0
   2               4.4    3.3    2.3

And this is what I'd like to produce:

Padding amount     T1     T2     T3     T4     T5
   0               3      2.9    2.8    0      0    (--> padding = 0, so no change)
   1               0      2.9    2.8    2.7    0    (--> shifted one to the left)
   1               0      2.8    2.3    2.0    0
   2               0      0      4.4    3.3    2.3  (--> shifted two to the right)

I see that Keras has sequence padding, but not sure how this would work considering all rows have the same number of entries. I'm looking at Shift and np.roll but I'm sure a solution exists for this already somewhere.

like image 626
Ellio Avatar asked Jan 23 '26 21:01

Ellio


2 Answers

In numpy, you could construct an array of indices for the locations where you want to place your array elements.

Let's say you have

padding = np.array([0, 1, 1, 2])
data = np.array([[3.0, 2.9, 2.8],
                 [2.9, 2.8, 2.7],
                 [2.8, 2.3, 2.0],
                 [4.4, 3.3, 2.3]])
M, N = data.shape

The output array would be

output = np.zeros((M, N + padding.max()))

You can make an index of where the data goes:

rows = np.arange(M)[:, None]
cols = padding[:, None] + np.arange(N)

Since the shape of the index broadcasts to the shape of the shape of the data, you can assign the output directly:

output[rows, cols] = data

Not sure how this applies to a DataFrame exactly, but you could probably construct a new one after operating on the values of the old one. Alternatively, you could probably implement all these operations equivalently directly in pandas.

like image 120
Mad Physicist Avatar answered Jan 25 '26 12:01

Mad Physicist


This is one way of doing it, i've made the process really flexible in terms of how many time periods/steps it can take:

import pandas as pd

#data
d = {'Padding amount': [0, 1, 1, 2],
 'T1': [3, 2.9, 2.8, 4.4],
 'T2': [2.9, 2.7, 2.3, 3.3],
 'T3': [2.8, 2.7, 2.0, 2.3]}
#create DF
df = pd.DataFrame(data = d)
#get max padding amount
maxPadd = df['Padding amount'].max()
#list of time periods
timePeriodsCols = [c for c in df.columns.tolist() if 'T' in c]
#reverse list
reverseList = timePeriodsCols[::-1]
#number of periods
noOfPeriods = len(timePeriodsCols)

#create new needed columns
for i in range(noOfPeriods + 1, noOfPeriods + 1 + maxPadd):
    df['T' + str(i)] = ''

#loop over records
for i, row in df.iterrows():
    #get padding amount
    padAmount = df.at[i, 'Padding amount']
    #if zero then do nothing
    if padAmount == 0:
        continue
    #else: roll column value by padding amount and set old location to zero
    else:
        for col in reverseList:
            df.at[i, df.columns[df.columns.get_loc(col) + padAmount]] = df.at[i, df.columns[df.columns.get_loc(col)]]
            df.at[i, df.columns[df.columns.get_loc(col)]] = 0

print(df)

   Padding amount   T1   T2   T3   T4   T5
0               0  3.0  2.9  2.8          
1               1  0.0  2.9  2.7  2.7     
2               1  0.0  2.8  2.3    2     
3               2  0.0  0.0  4.4  3.3  2.3
like image 32
Mit Avatar answered Jan 25 '26 10:01

Mit



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!