Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Repeat rows in a pandas DataFrame based on column value

I have the following df:

code . role    . persons 123 .  Janitor . 3 123 .  Analyst . 2 321 .  Vallet  . 2 321 .  Auditor . 5 

The first line means that I have 3 persons with the role Janitors. My problem is that I would need to have one line for each person. My df should look like this:

df:  code . role    . persons 123 .  Janitor . 3 123 .  Janitor . 3 123 .  Janitor . 3 123 .  Analyst . 2 123 .  Analyst . 2 321 .  Vallet  . 2 321 .  Vallet  . 2 321 .  Auditor . 5 321 .  Auditor . 5 321 .  Auditor . 5 321 .  Auditor . 5 321 .  Auditor . 5 

How could I do that using pandas?

like image 792
aabujamra Avatar asked Nov 16 '17 18:11

aabujamra


People also ask

How do you drop duplicate rows in pandas based on a column?

Use DataFrame. drop_duplicates() to Drop Duplicate and Keep First Rows. You can use DataFrame. drop_duplicates() without any arguments to drop rows with the same values on all columns.

How do I find duplicate rows in pandas?

The pandas. DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.

How do you repeat rows in a data frame?

repeat(3) will create a list where each index value will be repeated 3 times and df. iloc[df. index. repeat(3),:] will help generate a dataframe with the rows as exactly returned by this list.

Can you loop through a pandas DataFrame?

You can loop over a pandas dataframe, for each column row by row.


Video Answer


2 Answers

reindex+ repeat

df.reindex(df.index.repeat(df.persons)) Out[951]:     code  .     role ..1  persons 0   123  .  Janitor   .        3 0   123  .  Janitor   .        3 0   123  .  Janitor   .        3 1   123  .  Analyst   .        2 1   123  .  Analyst   .        2 2   321  .   Vallet   .        2 2   321  .   Vallet   .        2 3   321  .  Auditor   .        5 3   321  .  Auditor   .        5 3   321  .  Auditor   .        5 3   321  .  Auditor   .        5 3   321  .  Auditor   .        5 

PS: you can add.reset_index(drop=True) to get the new index

like image 151
BENY Avatar answered Sep 23 '22 19:09

BENY


Wen's solution is really nice and intuitive. Here's an alternative, calling repeat on df.values.

df     code     role  persons 0   123  Janitor        3 1   123  Analyst        2 2   321   Vallet        2 3   321  Auditor        5   pd.DataFrame(df.values.repeat(df.persons, axis=0), columns=df.columns)     code     role persons 0   123  Janitor       3 1   123  Janitor       3 2   123  Janitor       3 3   123  Analyst       2 4   123  Analyst       2 5   321   Vallet       2 6   321   Vallet       2 7   321  Auditor       5 8   321  Auditor       5 9   321  Auditor       5 10  321  Auditor       5 11  321  Auditor       5 
like image 39
cs95 Avatar answered Sep 22 '22 19:09

cs95