Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: How do I repeat dataframe for each value in a series?

I have a dataframe (df) as such:

A B
1 a
2 b
3 c

And a series: S = pd.Series(['x','y','z']) I want to repeat the dataframe df for each value in the series. The expected result is to be like this: result:

S A B
x 1 a
y 1 a
z 1 a
x 2 b
y 2 b
z 2 b
x 3 c
y 3 c
z 3 c

How do I achieve this kind of output? I'm thinking of merge or join but mergeing is giving me a memory error. I am dealing with a rather large dataframe and series. Thanks!

like image 921
Praneetha Avatar asked May 10 '17 18:05

Praneetha


People also ask

How do I repeat DataFrame in pandas?

Pandas Series: repeat() function The repeat() function is used to repeat elements of a Series. Returns a new Series where each element of the current Series is repeated consecutively a given number of times. The number of repetitions for each element. This should be a non-negative integer.

How do you repeat values in a data frame?

Pandas str. repeat() method is used to repeat string values in the same position of passed series itself. An array can also be passed in case to define the number of times each element should be repeated in series. For that case, length of array must be same as length of Series.

Can you use ILOC on a Series?

iloc attribute enables purely integer-location based indexing for selection by position over the given Series object. Example #1: Use Series. iloc attribute to perform indexing over the given Series object.

How do pandas Series multiply?

Multiplying of two pandas. Series objects can be done through applying the multiplication operator “*” as well. Through mul() method, handling None values in the data is possible by replacing them with a default value using the parameter fill_value.


2 Answers

Using numpy, lets say you have series and df of diffenent lengths

s= pd.Series(['X', 'Y', 'Z', 'A']) #added a character to s to make it length 4
s_n = len(s)
df_n = len(df)
pd.DataFrame(np.repeat(df.values,s_n, axis = 0), columns = df.columns, index = np.tile(s,df_n)).rename_axis('S').reset_index()

    S   A   B
0   X   1   a
1   Y   1   a
2   Z   1   a
3   A   1   a
4   X   2   b
5   Y   2   b
6   Z   2   b
7   A   2   b
8   X   3   c
9   Y   3   c
10  Z   3   c
11  A   3   c
like image 87
Vaishali Avatar answered Nov 14 '22 22:11

Vaishali


UPDATE:

here is a bit changed @A-Za-z's solution which might be bit more memory saving, but it's slower:

x = pd.DataFrame(index=range(len(df) * len(S)))

for col in df.columns:
    x[col] = np.repeat(df[col], len(s))

x['S'] = np.tile(S, len(df))

Old incorrect answer:

In [94]: pd.concat([df.assign(S=S)] * len(s))
Out[94]:
   A  B  S
0  1  a  x
1  2  b  y
2  3  c  z
0  1  a  x
1  2  b  y
2  3  c  z
0  1  a  x
1  2  b  y
2  3  c  z
like image 33
MaxU - stop WAR against UA Avatar answered Nov 14 '22 22:11

MaxU - stop WAR against UA