I have a data frame like this:
df
col1 col2
1 A
3 B
6 A
10 C
I want to create a data frame from above df in such a way that, if col1 values are not consecutive, it will create another row with the next col1 value and col2 value will be the just the above value.
the data frame I am looking for should be
df
col1 col2
1 A
2 A
3 B
4 B
5 B
6 A
7 A
8 A
9 A
10 C
I could do it using a simple for loop, But is there any pythonic way to do it most efficiently using pandas ?
Here is one way using set_index()
and reindex
and ffill
:
df.set_index('col1').reindex(range(df.col1.min(),df.col1.max()+1)).ffill().reset_index()
#df.set_index('col1').reindex(range(df.col1.min(),df.col1.max()+1),method='ffill')\
#.reset_index()
col1 col2
0 1 A
1 2 A
2 3 B
3 4 B
4 5 B
5 6 A
6 7 A
7 8 A
8 9 A
9 10 C
One way is using reindex
with ffill
:
(df.set_index('col1')
.reindex(range(df.col1.iloc[0], df.col1.iloc[-1]+1))
.ffill()
.reset_index())
col1 col2
0 1 A
1 2 A
2 3 B
3 4 B
4 5 B
5 6 A
6 7 A
7 8 A
8 9 A
9 10 C
Or another way using Series.repeat
:
df.col2.repeat(df.col1.diff().shift(-1).fillna().reset_index(drop=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With