Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Expanding pandas data frame with date range in columns

Tags:

I have a pandas dataframe with dates and strings similar to this:

Start        End           Note    Item
2016-10-22   2016-11-05    Z       A
2017-02-11   2017-02-25    W       B

I need to expand/transform it to the below, filling in weeks (W-SAT) in between the Start and End columns and forward filling the data in Note and Items:

Start        Note    Item
2016-10-22   Z       A
2016-10-29   Z       A
2016-11-05   Z       A
2017-02-11   W       B
2017-02-18   W       B
2017-02-25   W       B

Whats the best way to do this with pandas? Some sort of multi-index apply?

like image 525
claybot Avatar asked Feb 10 '17 05:02

claybot


People also ask

How do I expand pandas DataFrame?

Pandas DataFrame - expanding() functionThe expanding() function is used to provide expanding transformations. Minimum number of observations in window required to have a value (otherwise result is NA). Set the labels at the center of the window.

How do you extend a data frame?

The append() method appends a DataFrame-like object at the end of the current DataFrame. The append() method returns a new DataFrame object, no changes are done with the original DataFrame.

How do I extend a column in pandas?

In pandas you can add/append a new column to the existing DataFrame using DataFrame. insert() method, this method updates the existing DataFrame with a new column. DataFrame. assign() is also used to insert a new column however, this method returns a new Dataframe after adding a new column.


2 Answers

You can iterate over each row and create a new dataframe and then concatenate them together

pd.concat([pd.DataFrame({'Start': pd.date_range(row.Start, row.End, freq='W-SAT'),
               'Note': row.Note,
               'Item': row.Item}, columns=['Start', 'Note', 'Item']) 
           for i, row in df.iterrows()], ignore_index=True)

       Start Note Item
0 2016-10-22    Z    A
1 2016-10-29    Z    A
2 2016-11-05    Z    A
3 2017-02-11    W    B
4 2017-02-18    W    B
5 2017-02-25    W    B
like image 175
Ted Petrou Avatar answered Sep 24 '22 04:09

Ted Petrou


You don't need iteration at all.

df_start_end = df.melt(id_vars=['Note','Item'],value_name='date')

df = df_start_end.groupby('Note').apply(lambda x: x.set_index('date').resample('W').pad()).drop(columns=['Note','variable']).reset_index()
like image 20
Gen Avatar answered Sep 24 '22 04:09

Gen