Say that I have the following data. Like how many times my kids opened the fridge for each hour from 1 PM to 3 PM.
| ----- | ----- | ----- |
| Name | Hour | Open |
| ----- | ----- | ----- |
| Bob | 1 | 4 |
| ----- | ----- | ----- |
| Bob | 3 | 2 |
| ----- | ----- | ----- |
| Jane | 1 | 1 |
| ----- | ----- | ----- |
| Jane | 2 | 7 |
| ----- | ----- | ----- |
If I call this with pandas, how do I fill the missing hours so I could have the following dataframe?
| ----- | ----- | ----- |
| Name | Hour | Open |
| ----- | ----- | ----- |
| Bob | 1 | 4 |
| ----- | ----- | ----- |
| Bob | 2 | None | <<-- New row with Null or 0 for 'Open' column.
| ----- | ----- | ----- |
| Bob | 3 | 2 |
| ----- | ----- | ----- |
| Jane | 1 | 1 |
| ----- | ----- | ----- |
| Jane | 2 | 7 |
| ----- | ----- | ----- |
| Jane | 3 | None | <<-- New row with Null or 0 for 'Open' column.
| ----- | ----- | ----- |
Obviously, I kinda need it to be automatic so I could use it for some real data. So I can't just insert a row. The index or value sorting is not important.
Idea is use DataFrame.reindex by all possible combinations created by MultiIndex.from_product:
mux = pd.MultiIndex.from_product([df['Name'].unique(),
range(1, df['Hour'].max() + 1)], names=['Name','Hour'])
df1 = (df.set_index(['Name','Hour'])
.reindex(mux)
.reset_index())
print (df1)
Name Hour Open
0 Bob 1 4.0
1 Bob 2 NaN
2 Bob 3 2.0
3 Jane 1 1.0
4 Jane 2 7.0
5 Jane 3 NaN
If use pandas 0.24+ is possible use Nullable Integer Data Type:
df1 = (df.set_index(['Name','Hour'])
.reindex(mux).astype('Int64')
.reset_index())
print (df1)
Name Hour Open
0 Bob 1 4
1 Bob 2 NaN
2 Bob 3 2
3 Jane 1 1
4 Jane 2 7
5 Jane 3 NaN
And for replace non exist values to 0 add fill_value parameter:
df1 = (df.set_index(['Name','Hour'])
.reindex(mux, fill_value=0)
.reset_index())
print (df1)
Name Hour Open
0 Bob 1 4
1 Bob 2 0
2 Bob 3 2
3 Jane 1 1
4 Jane 2 7
5 Jane 3 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With