My pandas dataframe looks like this:
Person ID ZipCode Gender
0 12345 882 38182 Female
1 32917 271 88172 Male
2 18273 552 90291 Female
I want to replicate every row 3 times like:
Person ID ZipCode Gender
0 12345 882 38182 Female
0 12345 882 38182 Female
0 12345 882 38182 Female
1 32917 271 88172 Male
1 32917 271 88172 Male
1 32917 271 88172 Male
2 18273 552 90291 Female
2 18273 552 90291 Female
2 18273 552 90291 Female
And of course, reset the index so it is:
0
1
2
...
I tried solutions such as:
pd.concat([df[:5]]*3, ignore_index=True)
And:
df.reindex(np.repeat(df.index.values, df['ID']), method='ffill')
But none of them worked.
In R, the easiest way to repeat rows is with the REP() function. This function selects one or more observations from a data frame and creates one or more copies of them. Alternatively, you can use the SLICE() function from the dplyr package to repeat rows.
In Python, if you want to repeat the elements multiple times in the NumPy array then you can use the numpy. repeat() function. In Python, this method is available in the NumPy module and this function is used to return the numpy array of the repeated items along with axis such as 0 and 1.
Pandas Series: repeat() function The repeat() function is used to repeat elements of a Series. Returns a new Series where each element of the current Series is repeated consecutively a given number of times. The number of repetitions for each element. This should be a non-negative integer.
Pandas str. repeat() method is used to repeat string values in the same position of passed series itself. An array can also be passed in case to define the number of times each element should be repeated in series. For that case, length of array must be same as length of Series.
np.repeat
:Try using np.repeat
:
newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0)) newdf.columns = df.columns print(newdf)
The above code will output:
Person ID ZipCode Gender 0 12345 882 38182 Female 1 12345 882 38182 Female 2 12345 882 38182 Female 3 32917 271 88172 Male 4 32917 271 88172 Male 5 32917 271 88172 Male 6 18273 552 90291 Female 7 18273 552 90291 Female 8 18273 552 90291 Female
np.repeat
repeats the values of df
, 3
times.
Then we add the columns with assigning new_df.columns = df.columns
.
You could also assign the column names in the first line, like below:
newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns) print(newdf)
The above code will also output:
Person ID ZipCode Gender 0 12345 882 38182 Female 1 12345 882 38182 Female 2 12345 882 38182 Female 3 32917 271 88172 Male 4 32917 271 88172 Male 5 32917 271 88172 Male 6 18273 552 90291 Female 7 18273 552 90291 Female 8 18273 552 90291 Female
These will repeat the indices and preserve the columns as op demonstrated
iloc
version 1df.iloc[np.arange(len(df)).repeat(3)]
iloc
version 2df.iloc[np.arange(len(df) * 3) // 3]
Using concat
:
pd.concat([df]*3).sort_index()
Out[129]:
Person ID ZipCode Gender
0 12345 882 38182 Female
0 12345 882 38182 Female
0 12345 882 38182 Female
1 32917 271 88172 Male
1 32917 271 88172 Male
1 32917 271 88172 Male
2 18273 552 90291 Female
2 18273 552 90291 Female
2 18273 552 90291 Female
You can do it like this.
def do_things(df, n_times):
ndf = df.append(pd.DataFrame({'name' : np.repeat(df.name.values, n_times) }))
ndf = ndf.sort_values(by='name')
ndf = ndf.reset_index(drop=True)
return ndf
if __name__ == '__main__':
df = pd.DataFrame({'name' : ['Peter', 'Quill', 'Jackson']})
n_times = 3
print do_things(df, n_times)
And with explanation...
import pandas as pd
import numpy as np
n_times = 3
df = pd.DataFrame({'name' : ['Peter', 'Quill', 'Jackson']})
# name
# 0 Peter
# 1 Quill
# 2 Jackson
# Duplicating data.
df = df.append(pd.DataFrame({'name' : np.repeat(df.name.values, n_times) }))
# name
# 0 Peter
# 1 Quill
# 2 Jackson
# 0 Peter
# 1 Peter
# 2 Peter
# 3 Quill
# 4 Quill
# 5 Quill
# 6 Jackson
# 7 Jackson
# 8 Jackson
# The DataFrame is sorted by 'name' column.
df = df.sort_values(by=['name'])
# name
# 2 Jackson
# 6 Jackson
# 7 Jackson
# 8 Jackson
# 0 Peter
# 0 Peter
# 1 Peter
# 2 Peter
# 1 Quill
# 3 Quill
# 4 Quill
# 5 Quill
# Reseting the index.
# You can play with drop=True and drop=False, as parameter of `reset_index()`
df = df.reset_index()
# index name
# 0 2 Jackson
# 1 6 Jackson
# 2 7 Jackson
# 3 8 Jackson
# 4 0 Peter
# 5 0 Peter
# 6 1 Peter
# 7 2 Peter
# 8 1 Quill
# 9 3 Quill
# 10 4 Quill
# 11 5 Quill
You can try the following code:
df = df.iloc[df.index.repeat(3),:].reset_index()
df.index.repeat(3)
will create a list where each index value will be repeated 3 times and df.iloc[df.index.repeat(3),:]
will help generate a dataframe with the rows as exactly returned by this list.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With