i have a columns called 'factor' and each time a name in that column changes, i would like to insert a blank row, is this possible?
for i in range(0, end):
if df2.at[i + 1, 'factor'] != df2.at[i, 'factor']:
It's inefficient to manually insert rows sequentially in a for
loop. As an alternative, you can find the indices where changes occur, construct a new dataframe, concatenate, then sort by index:
df = pd.DataFrame([[1, 1], [2, 1], [3, 2], [4, 2],
[5, 2], [6, 3]], columns=['A', 'B'])
switches = df['B'].ne(df['B'].shift(-1))
idx = switches[switches].index
df_new = pd.DataFrame(index=idx + 0.5)
df = pd.concat([df, df_new]).sort_index()
print(df)
A B
0.0 1.0 1.0
1.0 2.0 1.0
1.5 NaN NaN
2.0 3.0 2.0
3.0 4.0 2.0
4.0 5.0 2.0
4.5 NaN NaN
5.0 6.0 3.0
5.5 NaN NaN
If necessary, you can use reset_index
to normalize the index:
print(df.reset_index(drop=True))
A B
0 1.0 1.0
1 2.0 1.0
2 NaN NaN
3 3.0 2.0
4 4.0 2.0
5 5.0 2.0
6 NaN NaN
7 6.0 3.0
8 NaN NaN
Use reindex
by Float64Index
of edge indices
added to 0.5
with union
of original index.
df2 = pd.DataFrame({'factor':list('aaabbccdd')})
idx = df2.index.union(df2.index[df2['factor'].shift(-1).ne(df2['factor'])] + .5)[:-1]
print (idx)
Float64Index([0.0, 1.0, 2.0, 2.5, 3.0, 4.0, 4.5, 5.0, 6.0, 6.5, 7.0, 8.0], dtype='float64')
df2 = df2.reindex(idx, fill_value='').reset_index(drop=True)
print (df2)
factor
0 a
1 a
2 a
3
4 b
5 b
6
7 c
8 c
9
10 d
11 d
If want missing values:
df2 = df2.reindex(idx).reset_index(drop=True)
print (df2)
factor
0 a
1 a
2 a
3 NaN
4 b
5 b
6 NaN
7 c
8 c
9 NaN
10 d
11 d
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With