Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I replicate rows in Pandas?

My pandas dataframe looks like this:

   Person  ID   ZipCode   Gender
0  12345   882  38182     Female
1  32917   271  88172     Male
2  18273   552  90291     Female

I want to replicate every row 3 times like:

   Person  ID   ZipCode   Gender
0  12345   882  38182     Female
0  12345   882  38182     Female
0  12345   882  38182     Female
1  32917   271  88172     Male
1  32917   271  88172     Male
1  32917   271  88172     Male
2  18273   552  90291     Female
2  18273   552  90291     Female
2  18273   552  90291     Female

And of course, reset the index so it is:

0
1
2
...

I tried solutions such as:

pd.concat([df[:5]]*3, ignore_index=True)

And:

df.reindex(np.repeat(df.index.values, df['ID']), method='ffill')

But none of them worked.

like image 864
DasVisual Avatar asked Jun 10 '18 22:06

DasVisual


People also ask

How do you repeat rows in a data frame?

In R, the easiest way to repeat rows is with the REP() function. This function selects one or more observations from a data frame and creates one or more copies of them. Alternatively, you can use the SLICE() function from the dplyr package to repeat rows.

How do you repeat a row multiple times in Python?

In Python, if you want to repeat the elements multiple times in the NumPy array then you can use the numpy. repeat() function. In Python, this method is available in the NumPy module and this function is used to return the numpy array of the repeated items along with axis such as 0 and 1.

How do you repeat a series on pandas?

Pandas Series: repeat() function The repeat() function is used to repeat elements of a Series. Returns a new Series where each element of the current Series is repeated consecutively a given number of times. The number of repetitions for each element. This should be a non-negative integer.

How do I repeat a column in pandas?

Pandas str. repeat() method is used to repeat string values in the same position of passed series itself. An array can also be passed in case to define the number of times each element should be repeated in series. For that case, length of array must be same as length of Series.


5 Answers

Use np.repeat:

Version 1:

Try using np.repeat:

newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0)) newdf.columns = df.columns print(newdf) 

The above code will output:

  Person   ID ZipCode  Gender 0  12345  882   38182  Female 1  12345  882   38182  Female 2  12345  882   38182  Female 3  32917  271   88172    Male 4  32917  271   88172    Male 5  32917  271   88172    Male 6  18273  552   90291  Female 7  18273  552   90291  Female 8  18273  552   90291  Female 

np.repeat repeats the values of df, 3 times.

Then we add the columns with assigning new_df.columns = df.columns.

Version 2:

You could also assign the column names in the first line, like below:

newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns) print(newdf) 

The above code will also output:

  Person   ID ZipCode  Gender 0  12345  882   38182  Female 1  12345  882   38182  Female 2  12345  882   38182  Female 3  32917  271   88172    Male 4  32917  271   88172    Male 5  32917  271   88172    Male 6  18273  552   90291  Female 7  18273  552   90291  Female 8  18273  552   90291  Female 
like image 136
U12-Forward Avatar answered Sep 22 '22 23:09

U12-Forward


These will repeat the indices and preserve the columns as op demonstrated

iloc version 1

df.iloc[np.arange(len(df)).repeat(3)] 

iloc version 2

df.iloc[np.arange(len(df) * 3) // 3] 
like image 37
piRSquared Avatar answered Sep 24 '22 23:09

piRSquared


Using concat:

pd.concat([df]*3).sort_index()
Out[129]: 
   Person   ID  ZipCode  Gender
0   12345  882    38182  Female
0   12345  882    38182  Female
0   12345  882    38182  Female
1   32917  271    88172    Male
1   32917  271    88172    Male
1   32917  271    88172    Male
2   18273  552    90291  Female
2   18273  552    90291  Female
2   18273  552    90291  Female
like image 34
BENY Avatar answered Sep 21 '22 23:09

BENY


You can do it like this.

def do_things(df, n_times):
    ndf = df.append(pd.DataFrame({'name' : np.repeat(df.name.values, n_times) }))
    ndf = ndf.sort_values(by='name')
    ndf = ndf.reset_index(drop=True)
    return ndf

if __name__ == '__main__':
    df = pd.DataFrame({'name' : ['Peter', 'Quill', 'Jackson']}) 
    n_times = 3
    print do_things(df, n_times)

And with explanation...

import pandas as pd
import numpy as np

n_times = 3
df = pd.DataFrame({'name' : ['Peter', 'Quill', 'Jackson']})
#       name
# 0    Peter
# 1    Quill
# 2  Jackson

#   Duplicating data.
df = df.append(pd.DataFrame({'name' : np.repeat(df.name.values, n_times) }))
#       name
# 0    Peter
# 1    Quill
# 2  Jackson
# 0    Peter
# 1    Peter
# 2    Peter
# 3    Quill
# 4    Quill
# 5    Quill
# 6  Jackson
# 7  Jackson
# 8  Jackson

#   The DataFrame is sorted by 'name' column.
df = df.sort_values(by=['name'])
#       name
# 2  Jackson
# 6  Jackson
# 7  Jackson
# 8  Jackson
# 0    Peter
# 0    Peter
# 1    Peter
# 2    Peter
# 1    Quill
# 3    Quill
# 4    Quill
# 5    Quill

#   Reseting the index.
#   You can play with drop=True and drop=False, as parameter of `reset_index()`
df = df.reset_index()
#     index     name
# 0       2  Jackson
# 1       6  Jackson
# 2       7  Jackson
# 3       8  Jackson
# 4       0    Peter
# 5       0    Peter
# 6       1    Peter
# 7       2    Peter
# 8       1    Quill
# 9       3    Quill
# 10      4    Quill
# 11      5    Quill
like image 21
IMCoins Avatar answered Sep 23 '22 23:09

IMCoins


You can try the following code:

df = df.iloc[df.index.repeat(3),:].reset_index()

df.index.repeat(3) will create a list where each index value will be repeated 3 times and df.iloc[df.index.repeat(3),:] will help generate a dataframe with the rows as exactly returned by this list.

like image 36
mahesha sahoo Avatar answered Sep 25 '22 23:09

mahesha sahoo