I want to create a new column in pandas that increments every 5 rows containing specific data (column X) like below
1
1
1
1
1
2
2
2
2
2
3
another option:
df['new'] = df.index / 5
uses odd python 2 division to floor your index. (i'm sure there's a way to do it similarly in 3?)
edit:
df['new'] = df.index / 5 + 1
works to give you values from 1 instead of 0
python 3 thanks to BusyBee:
df['new']= (df.index / 5 + 1).astype(int)
If you want to start at row x:
import pandas as pd
df = pd.DataFrame({'data': range(50)}, columns=['data'])
x = 23
df['two'] = None
df.loc[x:, 'two'] = df.index[x:] / 5 +1
print df
if you want to start at x and then number from 1 you need to subtract x
df.loc[x:, 'two'] = (df.index[x:] - x) / 5 +1
but I'm not sure this is the best method for that anymore.
you can use .shift
after you apply it but for some reason you can't shift df.index
(probably a good reason!)
You can use numpy.repeat
with index and loc
for repeat multiple columns:
Notice - index values have to be unique.
df = pd.DataFrame({'A':list('agb'),
'B':[4,5,4],
'C':[7,8,9]})
print (df)
A B C
0 a 4 7
1 g 5 8
2 b 4 9
df = df.loc[df.index.repeat(5)].reset_index(drop=True)
print (df)
A B C
0 a 4 7
1 a 4 7
2 a 4 7
3 a 4 7
4 a 4 7
5 g 5 8
6 g 5 8
7 g 5 8
8 g 5 8
9 g 5 8
10 b 4 9
11 b 4 9
12 b 4 9
13 b 4 9
14 b 4 9
And if need one column only:
df = pd.DataFrame({'D': df.A.values.repeat(5)})
print (df)
D
0 a
1 a
2 a
3 a
4 a
5 g
6 g
7 g
8 g
9 g
10 b
11 b
12 b
13 b
14 b
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With