Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python - Duplicate rows x number of times based on a value in a column

Tags:

python

pandas

I have a pandas dataframe of bookings at a hotel. Each row is a booking, like this:

Name             Arrival       Departure     RoomNights
Trent Cotchin    29/10/2017    2/11/2017     4
Dustin Martin    1/11/2017     4/11/2017     3
Alex Rance       2/11/2017     3/11/2017     1

I want to use python to convert so that each row becomes a roomnight. The output would look like this:

Name             Arrival       Departure     RoomNights   RoomNight Date
Trent Cotchin    29/10/2017    2/11/2017     4            29/10/2017
Trent Cotchin    29/10/2017    2/11/2017     4            30/10/2017
Trent Cotchin    29/10/2017    2/11/2017     4            31/10/2017
Trent Cotchin    29/10/2017    2/11/2017     4            1/11/2017
Dustin Martin    1/11/2017     4/11/2017     3            1/11/2017
Dustin Martin    1/11/2017     4/11/2017     3            2/11/2017
Dustin Martin    1/11/2017     4/11/2017     3            3/11/2017
Alex Rance       2/11/2017     3/11/2017     1            2/11/2017

This allows me to easily sum the total number of roomnights for any given day/month.

like image 980
Ben Sharkey Avatar asked Oct 10 '17 05:10

Ben Sharkey


People also ask

How do you duplicate rows and times?

In the Copy and insert rows & columns dialog box, select Copy and insert rows option in the Type section, then select the data range you want to duplicate, and then specify the repeat time to duplicate the rows, see screenshot: 4.

How do you count duplicate rows in Python?

You can count the number of duplicate rows by counting True in pandas. Series obtained with duplicated() . The number of True can be counted with sum() method. If you want to count the number of False (= the number of non-duplicate rows), you can invert it with negation ~ and then count True with sum() .


1 Answers

Use:

#convert columns to datetime
df['Arrival'] = pd.to_datetime(df['Arrival'])
df['Departure'] = pd.to_datetime(df['Departure'])

#repeat rows
df = df.loc[df.index.repeat(df['RoomNights'])]
#group by index with transform for date ranges
df['RoomNight Date'] =(df.groupby(level=0)['Arrival']
                         .transform(lambda x: pd.date_range(start=x.iat[0], periods=len(x))))
#unique default index
df = df.reset_index(drop=True)
print (df)
            Name    Arrival  Departure  RoomNights RoomNight Date
0  Trent Cotchin 2017-10-29 2017-11-02           4     2017-10-29
1  Trent Cotchin 2017-10-29 2017-11-02           4     2017-10-30
2  Trent Cotchin 2017-10-29 2017-11-02           4     2017-10-31
3  Trent Cotchin 2017-10-29 2017-11-02           4     2017-11-01
4  Dustin Martin 2017-11-01 2017-11-04           3     2017-11-01
5  Dustin Martin 2017-11-01 2017-11-04           3     2017-11-02
6  Dustin Martin 2017-11-01 2017-11-04           3     2017-11-03
7     Alex Rance 2017-11-02 2017-11-03           1     2017-11-02
like image 92
jezrael Avatar answered Sep 28 '22 02:09

jezrael