Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas replicate rows in dataframe

If the dataframe looks like:

Store,Dept,Date,Weekly_Sales,IsHoliday
1,1,2010-02-05,24924.5,FALSE
1,1,2010-02-12,46039.49,TRUE
1,1,2010-02-19,41595.55,FALSE
1,1,2010-02-26,19403.54,FALSE
1,1,2010-03-05,21827.9,FALSE
1,1,2010-03-12,21043.39,FALSE
1,1,2010-03-19,22136.64,FALSE
1,1,2010-03-26,26229.21,FALSE
1,1,2010-04-02,57258.43,FALSE

And I wanna duplicate rows with IsHoliday equal to TRUE, I can do:

is_hol = df['IsHoliday'] == True
df_try = df[is_hol]
df=df.append(df_try*10)

But is there a better way to do this as I need to duplicate holiday rows 5 times, and I have to append 5 times if using the above way.

like image 789
dcc Avatar asked Jun 04 '14 05:06

dcc


People also ask

How do you repeat rows in a data frame?

You can also repeat a complete data frame with the dplyr package. Alternatively, you can also use the SLICE() and REP() functions to repeat all rows from a data frame. An advantage of this method is that you can duplicate one data frame multiple times (which is not possible with the BIND_ROWS() function).

How do you repeat a row in Python?

repeat(3) will create a list where each index value will be repeated 3 times and df. iloc[df. index. repeat(3),:] will help generate a dataframe with the rows as exactly returned by this list.

How do you replicate a row?

Select the rows into which you want to copy the original row or rows. Right-click the selection, and then click "Insert Copied Cells." Excel inserts the repeated data into the new rows, moving the existing rows down.


6 Answers

You can put df_try inside a list and then do what you have in mind:

>>> df.append([df_try]*5,ignore_index=True)

    Store  Dept       Date  Weekly_Sales IsHoliday
0       1     1 2010-02-05      24924.50     False
1       1     1 2010-02-12      46039.49      True
2       1     1 2010-02-19      41595.55     False
3       1     1 2010-02-26      19403.54     False
4       1     1 2010-03-05      21827.90     False
5       1     1 2010-03-12      21043.39     False
6       1     1 2010-03-19      22136.64     False
7       1     1 2010-03-26      26229.21     False
8       1     1 2010-04-02      57258.43     False
9       1     1 2010-02-12      46039.49      True
10      1     1 2010-02-12      46039.49      True
11      1     1 2010-02-12      46039.49      True
12      1     1 2010-02-12      46039.49      True
13      1     1 2010-02-12      46039.49      True
like image 182
Karl D. Avatar answered Sep 29 '22 20:09

Karl D.


Other way is using concat() function:

import pandas as pd

In [603]: df = pd.DataFrame({'col1':list("abc"),'col2':range(3)},index = range(3))

In [604]: df
Out[604]: 
  col1  col2
0    a     0
1    b     1
2    c     2

In [605]: pd.concat([df]*3, ignore_index=True) # Ignores the index
Out[605]: 
  col1  col2
0    a     0
1    b     1
2    c     2
3    a     0
4    b     1
5    c     2
6    a     0
7    b     1
8    c     2

In [606]: pd.concat([df]*3)
Out[606]: 
  col1  col2
0    a     0
1    b     1
2    c     2
0    a     0
1    b     1
2    c     2
0    a     0
1    b     1
2    c     2
like image 20
Surya Avatar answered Sep 29 '22 20:09

Surya


This is an old question, but since it still comes up at the top of my results in Google, here's another way.

import pandas as pd
import numpy as np

df = pd.DataFrame({'col1':list("abc"),'col2':range(3)},index = range(3))

Say you want to replicate the rows where col1="b".

reps = [3 if val=="b" else 1 for val in df.col1]
df.loc[np.repeat(df.index.values, reps)]

You could replace the 3 if val=="b" else 1 in the list interpretation with another function that could return 3 if val=="b" or 4 if val=="c" and so on, so it's pretty flexible.

like image 44
snooze_bear Avatar answered Sep 30 '22 20:09

snooze_bear


Appending and concatenating is usually slow in Pandas so I recommend just making a new list of the rows and turning that into a dataframe (unless appending a single row or concatenating a few dataframes).

import pandas as pd

df = pd.DataFrame([
[1,1,'2010-02-05',24924.5,False],
[1,1,'2010-02-12',46039.49,True],
[1,1,'2010-02-19',41595.55,False],
[1,1,'2010-02-26',19403.54,False],
[1,1,'2010-03-05',21827.9,False],
[1,1,'2010-03-12',21043.39,False],
[1,1,'2010-03-19',22136.64,False],
[1,1,'2010-03-26',26229.21,False],
[1,1,'2010-04-02',57258.43,False]
], columns=['Store','Dept','Date','Weekly_Sales','IsHoliday'])

temp_df = []
for row in df.itertuples(index=False):
    if row.IsHoliday:
        temp_df.extend([list(row)]*5)
    else:
        temp_df.append(list(row))

df = pd.DataFrame(temp_df, columns=df.columns)
like image 31
grofte Avatar answered Sep 28 '22 20:09

grofte


You can do it in one line:

df.append([df[df['IsHoliday'] == True]] * 5, ignore_index=True)

or

df.append([df[df['IsHoliday']]] * 5, ignore_index=True)
like image 27
Mykola Zotko Avatar answered Oct 01 '22 20:10

Mykola Zotko


Another alternative to append() is to first replace the values of a column by a list of entries and then explode() (either using ignore_index=True or not, depending on what you want):

df['IsHoliday'] = df['IsHoliday'].apply(lambda x: 5*[x] if (x == True) else x)

df.explode('IsHoliday', ignore_index=True)

The nice thing about this one is that you can already use the list in the apply() call to build copies of rows with modified values in a column, in case you wanted to do that later anyways...

like image 38
buddemat Avatar answered Sep 29 '22 20:09

buddemat