I have a dataframe (df) as such:
A B
1 a
2 b
3 c
And a series: S = pd.Series(['x','y','z'])
I want to repeat the dataframe df for each value in the series. The expected result is to be like this:
result:
S A B
x 1 a
y 1 a
z 1 a
x 2 b
y 2 b
z 2 b
x 3 c
y 3 c
z 3 c
How do I achieve this kind of output? I'm thinking of merge or join but mergeing is giving me a memory error. I am dealing with a rather large dataframe and series. Thanks!
Pandas Series: repeat() function The repeat() function is used to repeat elements of a Series. Returns a new Series where each element of the current Series is repeated consecutively a given number of times. The number of repetitions for each element. This should be a non-negative integer.
Pandas str. repeat() method is used to repeat string values in the same position of passed series itself. An array can also be passed in case to define the number of times each element should be repeated in series. For that case, length of array must be same as length of Series.
iloc attribute enables purely integer-location based indexing for selection by position over the given Series object. Example #1: Use Series. iloc attribute to perform indexing over the given Series object.
Multiplying of two pandas. Series objects can be done through applying the multiplication operator “*” as well. Through mul() method, handling None values in the data is possible by replacing them with a default value using the parameter fill_value.
Using numpy, lets say you have series and df of diffenent lengths
s= pd.Series(['X', 'Y', 'Z', 'A']) #added a character to s to make it length 4
s_n = len(s)
df_n = len(df)
pd.DataFrame(np.repeat(df.values,s_n, axis = 0), columns = df.columns, index = np.tile(s,df_n)).rename_axis('S').reset_index()
S A B
0 X 1 a
1 Y 1 a
2 Z 1 a
3 A 1 a
4 X 2 b
5 Y 2 b
6 Z 2 b
7 A 2 b
8 X 3 c
9 Y 3 c
10 Z 3 c
11 A 3 c
UPDATE:
here is a bit changed @A-Za-z's solution which might be bit more memory saving, but it's slower:
x = pd.DataFrame(index=range(len(df) * len(S)))
for col in df.columns:
x[col] = np.repeat(df[col], len(s))
x['S'] = np.tile(S, len(df))
Old incorrect answer:
In [94]: pd.concat([df.assign(S=S)] * len(s))
Out[94]:
A B S
0 1 a x
1 2 b y
2 3 c z
0 1 a x
1 2 b y
2 3 c z
0 1 a x
1 2 b y
2 3 c z
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With