If the dataframe looks like: <pre class="prettyprint"><code>Store,Dept,Date,Weekly_Sales,IsHoliday 1,1,2010-02-05,24924.5,FALSE 1,1,2010-02-12,46039.49,TRUE 1,1,2010-02-19,41595.55,FALSE 1,1,2010-02-26,19403.54,FALSE 1,1,2010-03-05,21827.9,FALSE 1,1,2010-03-12,21043.39,FALSE 1,1,2010-03-19,22136.64,FALSE 1,1,2010-03-26,26229.21,FALSE 1,1,2010-04-02,57258.43,FALSE </code></pre> And I wanna duplicate rows with <code>IsHoliday</code> equal to TRUE, I can do: <pre class="prettyprint"><code>is_hol = df['IsHoliday'] == True df_try = df[is_hol] df=df.append(df_try*10) </code></pre> But is there a better way to do this as I need to duplicate holiday rows 5 times, and I have to append 5 times if using the above way.

This is an old question, but since it still comes up at the top of my results in Google, here's another way. <pre class="prettyprint"><code>import pandas as pd import numpy as np df = pd.DataFrame({'col1':list("abc"),'col2':range(3)},index = range(3)) </code></pre> Say you want to replicate the rows where col1="b". <pre class="prettyprint"><code>reps = [3 if val=="b" else 1 for val in df.col1] df.loc[np.repeat(df.index.values, reps)] </code></pre> You could replace the <code>3 if val=="b" else 1</code> in the list interpretation with another function that could return 3 if val=="b" or 4 if val=="c" and so on, so it's pretty flexible.

You can do it in one line: <pre class="prettyprint"><code>df.append([df[df['IsHoliday'] == True]] * 5, ignore_index=True) </code></pre> or <pre class="prettyprint"><code>df.append([df[df['IsHoliday']]] * 5, ignore_index=True) </code></pre>

Python Pandas replicate rows in dataframe

Tags:

python

pandas

dataframe

If the dataframe looks like:

Store,Dept,Date,Weekly_Sales,IsHoliday
1,1,2010-02-05,24924.5,FALSE
1,1,2010-02-12,46039.49,TRUE
1,1,2010-02-19,41595.55,FALSE
1,1,2010-02-26,19403.54,FALSE
1,1,2010-03-05,21827.9,FALSE
1,1,2010-03-12,21043.39,FALSE
1,1,2010-03-19,22136.64,FALSE
1,1,2010-03-26,26229.21,FALSE
1,1,2010-04-02,57258.43,FALSE

And I wanna duplicate rows with IsHoliday equal to TRUE, I can do:

is_hol = df['IsHoliday'] == True
df_try = df[is_hol]
df=df.append(df_try*10)

But is there a better way to do this as I need to duplicate holiday rows 5 times, and I have to append 5 times if using the above way.

789

asked Jun 04 '14 05:06

dcc

6 Answers

You can put df_try inside a list and then do what you have in mind:

>>> df.append([df_try]*5,ignore_index=True)

    Store  Dept       Date  Weekly_Sales IsHoliday
0       1     1 2010-02-05      24924.50     False
1       1     1 2010-02-12      46039.49      True
2       1     1 2010-02-19      41595.55     False
3       1     1 2010-02-26      19403.54     False
4       1     1 2010-03-05      21827.90     False
5       1     1 2010-03-12      21043.39     False
6       1     1 2010-03-19      22136.64     False
7       1     1 2010-03-26      26229.21     False
8       1     1 2010-04-02      57258.43     False
9       1     1 2010-02-12      46039.49      True
10      1     1 2010-02-12      46039.49      True
11      1     1 2010-02-12      46039.49      True
12      1     1 2010-02-12      46039.49      True
13      1     1 2010-02-12      46039.49      True

182

answered Sep 29 '22 20:09

Karl D.

Other way is using concat() function:

import pandas as pd

In [603]: df = pd.DataFrame({'col1':list("abc"),'col2':range(3)},index = range(3))

In [604]: df
Out[604]: 
  col1  col2
0    a     0
1    b     1
2    c     2

In [605]: pd.concat([df]*3, ignore_index=True) # Ignores the index
Out[605]: 
  col1  col2
0    a     0
1    b     1
2    c     2
3    a     0
4    b     1
5    c     2
6    a     0
7    b     1
8    c     2

In [606]: pd.concat([df]*3)
Out[606]: 
  col1  col2
0    a     0
1    b     1
2    c     2
0    a     0
1    b     1
2    c     2
0    a     0
1    b     1
2    c     2

answered Sep 29 '22 20:09

Surya

This is an old question, but since it still comes up at the top of my results in Google, here's another way.

import pandas as pd
import numpy as np

df = pd.DataFrame({'col1':list("abc"),'col2':range(3)},index = range(3))

Say you want to replicate the rows where col1="b".

reps = [3 if val=="b" else 1 for val in df.col1]
df.loc[np.repeat(df.index.values, reps)]

You could replace the 3 if val=="b" else 1 in the list interpretation with another function that could return 3 if val=="b" or 4 if val=="c" and so on, so it's pretty flexible.

answered Sep 30 '22 20:09

snooze_bear

Appending and concatenating is usually slow in Pandas so I recommend just making a new list of the rows and turning that into a dataframe (unless appending a single row or concatenating a few dataframes).

import pandas as pd

df = pd.DataFrame([
[1,1,'2010-02-05',24924.5,False],
[1,1,'2010-02-12',46039.49,True],
[1,1,'2010-02-19',41595.55,False],
[1,1,'2010-02-26',19403.54,False],
[1,1,'2010-03-05',21827.9,False],
[1,1,'2010-03-12',21043.39,False],
[1,1,'2010-03-19',22136.64,False],
[1,1,'2010-03-26',26229.21,False],
[1,1,'2010-04-02',57258.43,False]
], columns=['Store','Dept','Date','Weekly_Sales','IsHoliday'])

temp_df = []
for row in df.itertuples(index=False):
    if row.IsHoliday:
        temp_df.extend([list(row)]*5)
    else:
        temp_df.append(list(row))

df = pd.DataFrame(temp_df, columns=df.columns)

answered Sep 28 '22 20:09

grofte

You can do it in one line:

df.append([df[df['IsHoliday'] == True]] * 5, ignore_index=True)

df.append([df[df['IsHoliday']]] * 5, ignore_index=True)

answered Oct 01 '22 20:10

Mykola Zotko

Another alternative to append() is to first replace the values of a column by a list of entries and then explode() (either using ignore_index=True or not, depending on what you want):

df['IsHoliday'] = df['IsHoliday'].apply(lambda x: 5*[x] if (x == True) else x)

df.explode('IsHoliday', ignore_index=True)

The nice thing about this one is that you can already use the list in the apply() call to build copies of rows with modified values in a column, in case you wanted to do that later anyways...

answered Sep 29 '22 20:09

buddemat

Related questions
                            
                                str performance in python
                            
                                Why is the Borg pattern better than the Singleton pattern in Python
                            
                                Python - TypeError: 'int' object is not iterable
                            
                                Most suitable python library for Github API v3 [closed]
                            
                                Python Equivalent of setInterval()?
                            
                                Call a Python method by name
                            
                                Why is bool a subclass of int?
                            
                                How can I troubleshoot Python "Could not find platform independent libraries <prefix>"
                            
                                Mock attributes in Python mock?
                            
                                Converting a float to a string without rounding it
                            
                                Pandas DataFrame aggregate function using multiple columns
                            
                                Pytorch tensor to numpy array
                            
                                Fastest way to grow a numpy numeric array
                            
                                Sum the digits of a number
                            
                                Element-wise logical OR in Pandas
                            
                                what shebang to use for python scripts run under a pyenv virtualenv
                            
                                Support for Enum arguments in argparse
                            
                                Using super with a class method
                            
                                When is not a good time to use python generators?
                            
                                Create an empty data frame with index from another data frame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Pandas replicate rows in dataframe

Tags:

python

pandas

dataframe

dcc

People also ask

6 Answers

Karl D.

Surya

snooze_bear

grofte

Mykola Zotko

buddemat

Recent Activity

Donate For Us