This is my DataFrame that should be repeated for 5 times:
>>> x = pd.DataFrame({'a':1,'b':2}, index = range(1))
>>> x
a b
0 1 2
I want to have the result like this:
>>> x.append(x).append(x).append(x)
a b
0 1 2
0 1 2
0 1 2
0 1 2
But there must be a smarter way than appending 4 times. Actually the DataFrame I’m working on should be repeated 50 times.
I haven't found anything practical, including those like np.repeat
---- it just doesn't work on a DataFrame.
Could anyone help?
Pandas Series: repeat() function Returns a new Series where each element of the current Series is repeated consecutively a given number of times. The number of repetitions for each element. This should be a non-negative integer.
To create a data frame with a column having repeated values, we simply need to use rep function and we can repeat the values in a sequence of the values passed or repeating each value a particular number of times.
DataFrame Looping (iteration) with a for statement. You can loop over a pandas dataframe, for each column row by row.
You can use Numpy to repeat the values and reconstruct the dataframe.
Pandas str. repeat() method is used to repeat string values in the same position of passed series itself. An array can also be passed in case to define the number of times each element should be repeated in series. For that case, length of array must be same as length of Series.
You can use the concat
function:
In [13]: pd.concat([x]*5)
Out[13]:
a b
0 1 2
0 1 2
0 1 2
0 1 2
0 1 2
If you only want to repeat the values and not the index, you can do:
In [14]: pd.concat([x]*5, ignore_index=True)
Out[14]:
a b
0 1 2
1 1 2
2 1 2
3 1 2
4 1 2
I think it's cleaner/faster to use iloc
nowadays:
In [11]: np.full(3, 0)
Out[11]: array([0, 0, 0])
In [12]: x.iloc[np.full(3, 0)]
Out[12]:
a b
0 1 2
0 1 2
0 1 2
More generally, you can use tile
or repeat
with arange
:
In [21]: df = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"])
In [22]: df
Out[22]:
A B
0 1 2
1 3 4
In [23]: np.tile(np.arange(len(df)), 3)
Out[23]: array([0, 1, 0, 1, 0, 1])
In [24]: np.repeat(np.arange(len(df)), 3)
Out[24]: array([0, 0, 0, 1, 1, 1])
In [25]: df.iloc[np.tile(np.arange(len(df)), 3)]
Out[25]:
A B
0 1 2
1 3 4
0 1 2
1 3 4
0 1 2
1 3 4
In [26]: df.iloc[np.repeat(np.arange(len(df)), 3)]
Out[26]:
A B
0 1 2
0 1 2
0 1 2
1 3 4
1 3 4
1 3 4
Note: This will work with non-integer indexed DataFrames (and Series).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With