Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: add new column that increment every several rows

I want to create a new column in pandas that increments every 5 rows containing specific data (column X) like below

1
1
1
1
1
2
2
2
2
2
3
like image 917
Busy Bee Avatar asked Sep 18 '25 21:09

Busy Bee


2 Answers

another option:

df['new'] = df.index / 5

uses odd python 2 division to floor your index. (i'm sure there's a way to do it similarly in 3?)

edit:

df['new'] = df.index / 5 + 1

works to give you values from 1 instead of 0

python 3 thanks to BusyBee:

df['new']= (df.index / 5 + 1).astype(int)

If you want to start at row x:

import pandas as pd

df = pd.DataFrame({'data': range(50)}, columns=['data'])

x = 23

df['two'] = None

df.loc[x:, 'two'] = df.index[x:] / 5 +1

print df

if you want to start at x and then number from 1 you need to subtract x

df.loc[x:, 'two'] = (df.index[x:] - x) / 5 +1

but I'm not sure this is the best method for that anymore.

you can use .shift after you apply it but for some reason you can't shift df.index (probably a good reason!)

like image 51
Stael Avatar answered Sep 20 '25 13:09

Stael


You can use numpy.repeat with index and loc for repeat multiple columns:

Notice - index values have to be unique.

df = pd.DataFrame({'A':list('agb'),
                   'B':[4,5,4],
                   'C':[7,8,9]})

print (df)
   A  B  C
0  a  4  7
1  g  5  8
2  b  4  9

df = df.loc[df.index.repeat(5)].reset_index(drop=True)
print (df)
    A  B  C
0   a  4  7
1   a  4  7
2   a  4  7
3   a  4  7
4   a  4  7
5   g  5  8
6   g  5  8
7   g  5  8
8   g  5  8
9   g  5  8
10  b  4  9
11  b  4  9
12  b  4  9
13  b  4  9
14  b  4  9

And if need one column only:

df = pd.DataFrame({'D': df.A.values.repeat(5)})
print (df)
    D
0   a
1   a
2   a
3   a
4   a
5   g
6   g
7   g
8   g
9   g
10  b
11  b
12  b
13  b
14  b
like image 27
jezrael Avatar answered Sep 20 '25 12:09

jezrael