Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to repeat Pandas data frame?

This is my DataFrame that should be repeated for 5 times:

>>> x = pd.DataFrame({'a':1,'b':2}, index = range(1))
>>> x
   a  b
0  1  2

I want to have the result like this:

>>> x.append(x).append(x).append(x)
   a  b
0  1  2
0  1  2
0  1  2
0  1  2

But there must be a smarter way than appending 4 times. Actually the DataFrame I’m working on should be repeated 50 times.

I haven't found anything practical, including those like np.repeat ---- it just doesn't work on a DataFrame.

Could anyone help?

like image 842
lsheng Avatar asked May 27 '14 11:05

lsheng


People also ask

How do I repeat DataFrame in pandas?

Pandas Series: repeat() function Returns a new Series where each element of the current Series is repeated consecutively a given number of times. The number of repetitions for each element. This should be a non-negative integer.

How do you create a repeated DataFrame?

To create a data frame with a column having repeated values, we simply need to use rep function and we can repeat the values in a sequence of the values passed or repeating each value a particular number of times.

Can you loop a DataFrame in Python?

DataFrame Looping (iteration) with a for statement. You can loop over a pandas dataframe, for each column row by row.

How do I repeat a column in pandas?

You can use Numpy to repeat the values and reconstruct the dataframe.

How do you repeat a value in a DataFrame in Python?

Pandas str. repeat() method is used to repeat string values in the same position of passed series itself. An array can also be passed in case to define the number of times each element should be repeated in series. For that case, length of array must be same as length of Series.


2 Answers

You can use the concat function:

In [13]: pd.concat([x]*5)
Out[13]: 
   a  b
0  1  2
0  1  2
0  1  2
0  1  2
0  1  2

If you only want to repeat the values and not the index, you can do:

In [14]: pd.concat([x]*5, ignore_index=True)
Out[14]: 
   a  b
0  1  2
1  1  2
2  1  2
3  1  2
4  1  2
like image 124
joris Avatar answered Oct 21 '22 22:10

joris


I think it's cleaner/faster to use iloc nowadays:

In [11]: np.full(3, 0)
Out[11]: array([0, 0, 0])

In [12]: x.iloc[np.full(3, 0)]
Out[12]:
   a  b
0  1  2
0  1  2
0  1  2

More generally, you can use tile or repeat with arange:

In [21]: df = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"])

In [22]: df
Out[22]:
   A  B
0  1  2
1  3  4

In [23]: np.tile(np.arange(len(df)), 3)
Out[23]: array([0, 1, 0, 1, 0, 1])

In [24]: np.repeat(np.arange(len(df)), 3)
Out[24]: array([0, 0, 0, 1, 1, 1])

In [25]: df.iloc[np.tile(np.arange(len(df)), 3)]
Out[25]:
   A  B
0  1  2
1  3  4
0  1  2
1  3  4
0  1  2
1  3  4

In [26]: df.iloc[np.repeat(np.arange(len(df)), 3)]
Out[26]:
   A  B
0  1  2
0  1  2
0  1  2
1  3  4
1  3  4
1  3  4

Note: This will work with non-integer indexed DataFrames (and Series).

like image 34
Andy Hayden Avatar answered Oct 21 '22 23:10

Andy Hayden