pandas dataframe create a new dataframe by duplicating n times rows of the previous dataframe and change date

Tags:

I have a dataframe with around around 9k rows and 57 cols, this is 'df'.

I need to have a new dataframe: 'df_final' - for each row of 'df' i have to replicate each row 'x' times and increase the day in each row one by one, also 'x' times. While i can do this for a couple of iterations, when i do it for the full length of 'df' ' len(df)' the loop it takes so long (>3 hours) that i actually had to cancel it. I have never seen the end of it. Here's the current code:

df.shape
output: (9454, 57)

df_int = df[0:0]
df_final = df_int[0:0]
range_df = len(df)
for x in range(0,2):
    df_int = df.iloc[0+x:x+1]
    if abs(df_int.iat[-1,3]) > 0:
        df_int = pd.concat([df_int]*abs(df_int.iat[-1,3]), ignore_index=True)
        for i in range(1, abs(df_int.iat[-1,3])):
            df_int['Consumption Date'][i] = df_int['Consumption Date'][i-1] + datetime.timedelta(days = 1)
            i += 1
       df_final = df_final.append(df_int, ignore_index=True)
    x += 1

The result of the loops for the first two rows of ' df' are below.

First two rows of df: enter image description here

Desired result: enter image description here

Is there another way to get to the desired output. It seems pandas do not deal very well with loops. In VBA excel the same loop takes around 3/4 minutes...i am trying to change a process which is currently in excel to python, however, if there's no way to make this work i guess i will stick to the old ways...

378

asked Sep 23 '17 15:09

dapaz

2 Answers

Use repeat and cumcount

In [2972]: dff = df.loc[df.index.repeat(3)]

In [2973]: dff
Out[2973]:
        date   name
0 2017-05-03    bob
0 2017-05-03    bob
0 2017-05-03    bob
1 2017-06-13  sally
1 2017-06-13  sally
1 2017-06-13  sally

In [2974]: dff.loc[:, 'date'] += pd.to_timedelta(dff.groupby(level=0).cumcount(), 'D')

In [2975]: dff
Out[2975]:
        date   name
0 2017-05-03    bob
0 2017-05-04    bob
0 2017-05-05    bob
1 2017-06-13  sally
1 2017-06-14  sally
1 2017-06-15  sally

Details

In [2976]: df
Out[2976]:
        date   name
0 2017-05-03    bob
1 2017-06-13  sally

In [2977]: dff.groupby(level=0).cumcount()
Out[2977]:
0    0
0    1
0    2
1    0
1    1
1    2
dtype: int64

186

answered Sep 27 '22 23:09

Zero

Let's use this toy DataFrame:

df = pd.DataFrame({
    'date': pd.to_datetime(['2017-05-03', '2017-06-13']),
    'name': ['bob', 'sally'],
})

It looks like this:

        date   name
0 2017-05-03    bob
1 2017-06-13  sally

Then:

x = 3 # repeat count
ind = np.repeat(np.arange(len(df)), x) # 0,0,0,1,1,1
df_final = df.iloc[ind].copy()

That gives you the repeats:

        date   name
0 2017-05-03    bob
0 2017-05-03    bob
0 2017-05-03    bob
1 2017-06-13  sally
1 2017-06-13  sally
1 2017-06-13  sally

Now you just need to increment the dates:

inc = np.tile(np.arange(x), len(df)) # 0,1,2,0,1,2
df_final.date += pd.to_timedelta(inc, 'D')

And you get:

        date   name
0 2017-05-03    bob
0 2017-05-04    bob
0 2017-05-05    bob
1 2017-06-13  sally
1 2017-06-14  sally
1 2017-06-15  sally

answered Sep 27 '22 22:09

John Zwinck

Related questions
                            
                                How to rotate an element in Holoviews
                            
                                Are the async/await keywords in python 3.5 inspired by async/await in C#? [closed]
                            
                                Replace a list of numbers with flat sub-ranges
                            
                                How to save OpenCV image with contour
                            
                                Using Chardet to find encoding of very large file
                            
                                Line hover text in Plotly
                            
                                Getting the three smallest values per row and returning the correspondent column names
                            
                                Why did I have problems with alembic migrations
                            
                                plt.subplot axis sharing not working
                            
                                Pandas Multi-Index DataFrame to Numpy Ndarray
                            
                                Numpy Advanced Indexing: same index used multiple times in += [duplicate]
                            
                                How to stop sound in pygame?
                            
                                running aws athena query via pyathena
                            
                                How to draw an arc on a tkinter canvas?
                            
                                Array manipulation in python
                            
                                Count distinct strings in rolling window using pandas
                            
                                Updating pandas to version 0.19 in Azure ML Studio
                            
                                How to host a private python package manager in Azure or AWS
                            
                                Extracting text out of images
                            
                                Globally getting context in Wagtail site

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas dataframe create a new dataframe by duplicating n times rows of the previous dataframe and change date

Tags:

python

loops

pandas

dataframe

rows

dapaz

People also ask

2 Answers

Zero

John Zwinck

Recent Activity

Donate For Us