Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replicating rows in pandas dataframe by column value and add a new column with repetition index

My question is similar to one asked here. I have a dataframe and I want to repeat each row of the dataframe k number of times. Along with it, I also want to create a column with values 0 to k-1. So

import pandas as pd

df = pd.DataFrame(data={
  'id': ['A', 'B', 'C'],
  'n' : [  1,   2,   3],
  'v' : [ 10,  13,   8]
})

what_i_want = pd.DataFrame(data={
  'id': ['A', 'B', 'B', 'C', 'C', 'C'],
  'n' : [ 1, 2, 2, 3, 3, 3],
  'v' : [ 10,  13, 13, 8, 8, 8],
  'repeat_id': [0, 0, 1, 0, 1, 2]
})

Command below does half of the job. I am looking for pandas way of adding the repeat_id column.

df.loc[df.index.repeat(df.n)]
like image 570
kampta Avatar asked May 11 '18 11:05

kampta


People also ask

How do you repeat a row in a data frame?

In R, the easiest way to repeat rows is with the REP() function. This function selects one or more observations from a data frame and creates one or more copies of them. Alternatively, you can use the SLICE() function from the dplyr package to repeat rows.

How do you repeat a row multiple times in python?

In Python, if you want to repeat the elements multiple times in the NumPy array then you can use the numpy. repeat() function. In Python, this method is available in the NumPy module and this function is used to return the numpy array of the repeated items along with axis such as 0 and 1.

How do I repeat a column in pandas?

Pandas str. repeat() method is used to repeat string values in the same position of passed series itself. An array can also be passed in case to define the number of times each element should be repeated in series. For that case, length of array must be same as length of Series.


1 Answers

Use GroupBy.cumcount and copy for avoid SettingWithCopyWarning:

If you modify values in df1 later you will find that the modifications do not propagate back to the original data (df), and that Pandas does warning.

df1 = df.loc[df.index.repeat(df.n)].copy()
df1['repeat_id'] = df1.groupby(level=0).cumcount()
df1 = df1.reset_index(drop=True)
print (df1)
  id  n   v  repeat_id
0  A  1  10          0
1  B  2  13          0
2  B  2  13          1
3  C  3   8          0
4  C  3   8          1
5  C  3   8          2
like image 116
jezrael Avatar answered Oct 08 '22 16:10

jezrael