Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Repeat value every 4 rows and use the beginning rows to fill the rest

I want to create a new column that repeats the other column every 4 rows. Use the beginning rows to fill the rows in between. For example for df,

d = {'col1': range(1,10)}
df = pd.DataFrame(data=d)

I hope to create a col2 that returns to the following:

col1    col2
1        1
2        1
3        1
4        1
5        5
6        5
7        5
8        5
9        9

This is what I tried

df['col2'] = np.concatenate([np.repeat(df.col1.values[0::4], 4),
                             np.repeat(np.NaN, len(df)%3)])

It yields the error: ValueError: Length of values does not match length of index

If I change 4 to 3, the code works because len(df) is 9. I hope to work on a code that works more universally.

like image 433
Warrior Avatar asked Aug 16 '20 16:08

Warrior


1 Answers

Here is an approach, Dataframe.groupby.cumcount + pandas.Series.shift to create a mask. Use the mask to fill col2 with col1 & use Series.ffill missing values.

g = df.groupby(df.index % 4).cumcount()
mask = g.ne(g.shift(1))

0     True
1    False
2    False
3    False
4     True
5    False
6    False
7    False
8     True
dtype: bool

df.loc[mask, 'col2'] = df.loc[mask, 'col1']

   col1  col2
0     1   1.0
1     2   NaN
2     3   NaN
3     4   NaN
4     5   5.0
5     6   NaN
6     7   NaN
7     8   NaN
8     9   9.0

df['col2'].ffill(inplace=True)

   col1  col2
0     1   1.0
1     2   1.0
2     3   1.0
3     4   1.0
4     5   5.0
5     6   5.0
6     7   5.0
7     8   5.0
8     9   9.0
like image 152
sushanth Avatar answered Oct 28 '22 07:10

sushanth