I try to reindex a dataframe with two or more indices on one level only, but using .reindex()
with the level
argument, as I have seen in this answer seems to do nothing instead of the expected behavior.
Here is the code I try to make work:
import pandas as pd
dtest = pd.DataFrame([['Martin', 'room_1', 3],
['Martin', 'room_2', 2],
['Georges', 'room_2', 4],
['Georges', 'room_1', 12]],
columns=['name', 'room', 'time_spent'])
dtest.set_index(['name', 'room',], inplace=True)
display(dtest)
print(dtest.reindex(
['room_1', 'room_2', 'room_3'], level=1, fill_value=0))
which outputs
time_spent
name room
Martin room_1 3
room_2 2
Georges room_1 12
room_2 4
whereas the output I am expecting is
time_spent
name room
Martin room_1 3
room_2 2
room_3 0
Georges room_1 12
room_2 4
room_3 0
reindex()
literally didn't do anything.
Did I miss an important detail on the use of reindex
or is something broken?
I'm using the last version of pandas with Python 3.6.7 (v3.6.7:6ec5cf24b7, Oct 20 2018, 03:02:14).
Edit after accepting the answer
I have accepted @anky_91's answer who actually provides two different solutions. As my actual use case involves more levels in the multi-index, let me underline the behaviour of his two solutions with more levels.
The starting dataframe is now given by
import pandas as pd
dtest2 = pd.DataFrame([['2020-01-05', 'Martin', 'room_1', 3],
['2020-01-05', 'Martin', 'room_2', 2],
['2020-01-06', 'Georges', 'room_2', 4],
['2020-01-06', 'Georges', 'room_1', 12]],
columns=['date', 'name', 'room', 'time_spent'])
dtest2.set_index(['date', 'name', 'room',], inplace=True)
print(dtest2)
which outputs
time_spent
date name room
2020-01-05 Martin room_1 3
room_2 2
2020-01-06 Georges room_2 4
room_1 12
The first solution adapted to this case is
mux = pd.MultiIndex.from_product((dtest2.index.get_level_values(0).unique(),
dtest2.index.get_level_values(1).unique(),
['room_1', 'room_2', 'room_3']
))
final_first_solution = dtest2.reindex(mux,fill_value=0)
print(final_first_solution)
which outputs
time_spent
2020-01-05 Martin room_1 3
room_2 2
room_3 0
Georges room_1 0
room_2 0
room_3 0
2020-01-06 Martin room_1 0
room_2 0
room_3 0
Georges room_1 12
room_2 4
room_3 0
This is not my desired output as Martin needs to only appear on 2020-01-05 and Georges only appears on 2020-01-06. In other words, I only want to expand the index on the last level room
.
However, I managed to adapt the second solution to get my desired output:
final_second_solution = dtest2.unstack((0, 1)).reindex(['room_1', 'room_2', 'room_3'], fill_value=0)\
.stack((-2, -1)).swaplevel(i=-1, j=0).swaplevel(i=1, j=0).sort_index()
print(final_second_solution)
correctly gives me
time_spent
date name room
2020-01-05 Martin room_1 3.0
room_2 2.0
room_3 0.0
2020-01-06 Georges room_1 12.0
room_2 4.0
room_3 0.0
hence why I accepted the answer. Thanks!
Edit 2 for a different use case
After testing more, the above solution fails when you are not adding a new room, but instead want to complete all rooms appearing in the dataframe. But the solution is simpler, as in this case unstack
already creates the missing lines so we just have to use fillna
instead of reindex
. The following code
import pandas as pd
dtest3 = pd.DataFrame([['2020-01-05', 'Martin', 'room_1', 3],
['2020-01-06', 'Georges', 'room_2', 4]],
columns=['date', 'name', 'room', 'time_spent'])
dtest3.set_index(['date', 'name', 'room',], inplace=True)
print(dtest3)
final_third_solution = dtest3.unstack((0, 1)).fillna(0).stack((-2, -1)).reorder_levels([1,2,0]).sort_index()
print(final_third_solution)
hence gives the desired outcome
time_spent
date name room
2020-01-05 Martin room_1 3.0
room_2 0.0
2020-01-06 Georges room_1 0.0
room_2 4.0
(I have also included the suggestion in the comments of using reorder_levels
instead of using multiple swaplevel
).
Using .fillna(0).reindex(..., fill_value=0)
should encompass the two cases where you want to complete and add new elements.
You would need to create a multi index for this using pd.MultiIndex.from_product()
and get_level_values
:
mux=(pd.MultiIndex.from_product((dtest.index.get_level_values(0).unique()
,['room_1', 'room_2', 'room_3'])))
final=dtest.reindex(mux,fill_value=0)
time_spent
Martin room_1 3
room_2 2
room_3 0
Georges room_1 12
room_2 4
room_3 0
If you want to just add another index with the existing indexes , use Index.union
to the from_product()
arg:
mux=(pd.MultiIndex.from_product((dtest.index.get_level_values(0).unique()
,dtest.index.get_level_values(1).unique().union(['room3'],sort=False))))
final=dtest.reindex(mux,fill_value=0)
time_spent
Martin room_1 3
room_2 2
room_3 0
Georges room_1 12
room_2 4
room_3 0
Adding another metod with stack()
and unstack()
with swaplevel
:
final=dtest.unstack(0).reindex(['room_1', 'room_2', 'room_3']
,fill_value=0).stack().swaplevel().sort_index()
time_spent
name room
Georges room_1 12
room_2 4
room_3 0
Martin room_1 3
room_2 2
room_3 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With