Pandas filling missing dates and values within group

Tags:

I've a data frame that looks like the following

x = pd.DataFrame({'user': ['a','a','b','b'], 'dt': ['2016-01-01','2016-01-02', '2016-01-05','2016-01-06'], 'val': [1,33,2,1]})

What I would like to be able to do is find the minimum and maximum date within the date column and expand that column to have all the dates there while simultaneously filling in 0 for the val column. So the desired output is

            dt user  val 0   2016-01-01    a    1 1   2016-01-02    a   33 2   2016-01-03    a    0 3   2016-01-04    a    0 4   2016-01-05    a    0 5   2016-01-06    a    0 6   2016-01-01    b    0 7   2016-01-02    b    0 8   2016-01-03    b    0 9   2016-01-04    b    0 10  2016-01-05    b    2 11  2016-01-06    b    1

I've tried the solution mentioned here and here but they aren't what I'm after. Any pointers much appreciated.

403

asked Jul 07 '17 19:07

broccoli

2 Answers

Initial Dataframe:

            dt  user    val 0   2016-01-01     a      1 1   2016-01-02     a     33 2   2016-01-05     b      2 3   2016-01-06     b      1

First, convert the dates to datetime:

x['dt'] = pd.to_datetime(x['dt'])

Then, generate the dates and unique users:

dates = x.set_index('dt').resample('D').asfreq().index  >> DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',                '2016-01-05', '2016-01-06'],               dtype='datetime64[ns]', name='dt', freq='D')  users = x['user'].unique()  >> array(['a', 'b'], dtype=object)

This will allow you to create a MultiIndex:

idx = pd.MultiIndex.from_product((dates, users), names=['dt', 'user'])  >> MultiIndex(levels=[[2016-01-01 00:00:00, 2016-01-02 00:00:00, 2016-01-03 00:00:00, 2016-01-04 00:00:00, 2016-01-05 00:00:00, 2016-01-06 00:00:00], ['a', 'b']],            labels=[[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]],            names=['dt', 'user'])

You can use that to reindex your DataFrame:

x.set_index(['dt', 'user']).reindex(idx, fill_value=0).reset_index() Out:             dt user  val 0  2016-01-01    a    1 1  2016-01-01    b    0 2  2016-01-02    a   33 3  2016-01-02    b    0 4  2016-01-03    a    0 5  2016-01-03    b    0 6  2016-01-04    a    0 7  2016-01-04    b    0 8  2016-01-05    a    0 9  2016-01-05    b    2 10 2016-01-06    a    0 11 2016-01-06    b    1

which then can be sorted by users:

x.set_index(['dt', 'user']).reindex(idx, fill_value=0).reset_index().sort_values(by='user') Out:             dt user  val 0  2016-01-01    a    1 2  2016-01-02    a   33 4  2016-01-03    a    0 6  2016-01-04    a    0 8  2016-01-05    a    0 10 2016-01-06    a    0 1  2016-01-01    b    0 3  2016-01-02    b    0 5  2016-01-03    b    0 7  2016-01-04    b    0 9  2016-01-05    b    2 11 2016-01-06    b    1

answered Sep 18 '22 23:09

ayhan

As @ayhan suggests

x.dt = pd.to_datetime(x.dt)

One-liner using mostly @ayhan's ideas while incorporating stack/unstack and fill_value

x.set_index(     ['dt', 'user'] ).unstack(     fill_value=0 ).asfreq(     'D', fill_value=0 ).stack().sort_index(level=1).reset_index()             dt user  val 0  2016-01-01    a    1 1  2016-01-02    a   33 2  2016-01-03    a    0 3  2016-01-04    a    0 4  2016-01-05    a    0 5  2016-01-06    a    0 6  2016-01-01    b    0 7  2016-01-02    b    0 8  2016-01-03    b    0 9  2016-01-04    b    0 10 2016-01-05    b    2 11 2016-01-06    b    1

answered Sep 19 '22 23:09

piRSquared

Related questions
                            
                                How to 'turn off' blurry effect of imshow() in matplotlib?
                            
                                How to add different graphs (as an inset) in another python graph [duplicate]
                            
                                How to bootstrap installation of Python modules on Amazon EMR?
                            
                                VS Code starts debugging in integrated terminal instead of debug console
                            
                                Python import src modules when running tests
                            
                                Reset a column's MultiIndex levels
                            
                                Python threading multiple bash subprocesses?
                            
                                What is the difference between SymPy and Sage?
                            
                                Pandas: Reading Excel with merged cells
                            
                                random.choice on Enum
                            
                                How to reference a html template from a different directory in python flask
                            
                                Seaborn Heatmap with logarithmic-scale colorbar
                            
                                Is there a Pathlib alternate for os.path.join?
                            
                                What is the Python way for recursively setting file permissions?
                            
                                Using python to write mysql query to csv, need to show field names
                            
                                Split string into different variables instead of array in Python [duplicate]
                            
                                How can I create custom page for django admin?
                            
                                How to give column name dynamically from string variable in sql alchemy filter?
                            
                                Python: SyntaxError: keyword can't be an expression
                            
                                Simple approach to launching background task in Django

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas filling missing dates and values within group

Tags:

python

pandas

dataframe

broccoli

People also ask

2 Answers

ayhan

piRSquared

Recent Activity

Donate For Us