Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replicating rows in a pandas data frame by a column value

Tags:

python

pandas

I want to replicate rows in a Pandas Dataframe. Each row should be repeated n times, where n is a field of each row.

import pandas as pd  what_i_have = pd.DataFrame(data={   'id': ['A', 'B', 'C'],   'n' : [  1,   2,   3],   'v' : [ 10,  13,   8] })  what_i_want = pd.DataFrame(data={   'id': ['A', 'B', 'B', 'C', 'C', 'C'],   'v' : [ 10,  13,  13,   8,   8,   8] }) 

Is this possible?

like image 875
Mersenne Prime Avatar asked Nov 06 '14 11:11

Mersenne Prime


People also ask

How do you repeat a row multiple times in Python?

In Python, if you want to repeat the elements multiple times in the NumPy array then you can use the numpy. repeat() function. In Python, this method is available in the NumPy module and this function is used to return the numpy array of the repeated items along with axis such as 0 and 1.

How do I get data frames from a specific column?

Selecting columns based on their name This is the most basic way to select a single column from a dataframe, just put the string name of the column in brackets. Returns a pandas series. Passing a list in the brackets lets you select multiple columns at the same time.


2 Answers

You can use Index.repeat to get repeated index values based on the column then select from the DataFrame:

df2 = df.loc[df.index.repeat(df.n)]    id  n   v 0  A  1  10 1  B  2  13 1  B  2  13 2  C  3   8 2  C  3   8 2  C  3   8 

Or you could use np.repeat to get the repeated indices and then use that to index into the frame:

df2 = df.loc[np.repeat(df.index.values, df.n)]    id  n   v 0  A  1  10 1  B  2  13 1  B  2  13 2  C  3   8 2  C  3   8 2  C  3   8 

After which there's only a bit of cleaning up to do:

df2 = df2.drop("n", axis=1).reset_index(drop=True)    id   v 0  A  10 1  B  13 2  B  13 3  C   8 4  C   8 5  C   8 

Note that if you might have duplicate indices to worry about, you could use .iloc instead:

df.iloc[np.repeat(np.arange(len(df)), df["n"])].drop("n", axis=1).reset_index(drop=True)    id   v 0  A  10 1  B  13 2  B  13 3  C   8 4  C   8 5  C   8 

which uses the positions, and not the index labels.

like image 184
DSM Avatar answered Sep 19 '22 07:09

DSM


You could use set_index and repeat

In [1057]: df.set_index(['id'])['v'].repeat(df['n']).reset_index() Out[1057]:   id   v 0  A  10 1  B  13 2  B  13 3  C   8 4  C   8 5  C   8 

Details

In [1058]: df Out[1058]:   id  n   v 0  A  1  10 1  B  2  13 2  C  3   8 
like image 43
Zero Avatar answered Sep 20 '22 07:09

Zero