How do you create rows for every categories in a column?

Question

Say that I have the following data. Like how many times my kids opened the fridge for each hour from 1 PM to 3 PM.

| ----- | ----- | ----- |
|  Name |  Hour |  Open |
| ----- | ----- | ----- |
|  Bob  |     1 |     4 |
| ----- | ----- | ----- |
|  Bob  |     3 |     2 |
| ----- | ----- | ----- |
|  Jane |     1 |     1 |
| ----- | ----- | ----- |
|  Jane |     2 |     7 |
| ----- | ----- | ----- |

If I call this with pandas, how do I fill the missing hours so I could have the following dataframe?

| ----- | ----- | ----- |
|  Name |  Hour |  Open |
| ----- | ----- | ----- |
|  Bob  |     1 |     4 |
| ----- | ----- | ----- |
|  Bob  |     2 |  None |    <<-- New row with Null or 0 for 'Open' column.
| ----- | ----- | ----- |
|  Bob  |     3 |     2 |
| ----- | ----- | ----- |
|  Jane |     1 |     1 |
| ----- | ----- | ----- |
|  Jane |     2 |     7 |
| ----- | ----- | ----- |
|  Jane |     3 |  None |    <<-- New row with Null or 0 for 'Open' column.
| ----- | ----- | ----- |

Obviously, I kinda need it to be automatic so I could use it for some real data. So I can't just insert a row. The index or value sorting is not important.

jezrael · Accepted Answer

Idea is use DataFrame.reindex by all possible combinations created by MultiIndex.from_product:

mux = pd.MultiIndex.from_product([df['Name'].unique(), 
                                  range(1, df['Hour'].max() + 1)], names=['Name','Hour'])
df1 = (df.set_index(['Name','Hour'])
         .reindex(mux)
         .reset_index())
print (df1)
   Name  Hour  Open
0   Bob     1   4.0
1   Bob     2   NaN
2   Bob     3   2.0
3  Jane     1   1.0
4  Jane     2   7.0
5  Jane     3   NaN

If use pandas 0.24+ is possible use Nullable Integer Data Type:

df1 = (df.set_index(['Name','Hour'])
         .reindex(mux).astype('Int64')
         .reset_index())
print (df1)
   Name  Hour  Open
0   Bob     1     4
1   Bob     2   NaN
2   Bob     3     2
3  Jane     1     1
4  Jane     2     7
5  Jane     3   NaN

And for replace non exist values to 0 add fill_value parameter:

df1 = (df.set_index(['Name','Hour'])
         .reindex(mux, fill_value=0)
         .reset_index())
print (df1)
   Name  Hour  Open
0   Bob     1     4
1   Bob     2     0
2   Bob     3     2
3  Jane     1     1
4  Jane     2     7
5  Jane     3     0

How do you create rows for every categories in a column?

Tags:

python

pandas

dataframe

user8397275

1 Answers

jezrael

Recent Activity

Donate For Us

How do you create rows for every categories in a column?

Tags:

python

pandas

dataframe

user8397275

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us