I have a Pandas DataFrame with a MultiIndex. The MultiIndex has values in the range (0,0) to (1000,1000), and the column has two fields p and q.
However, the DataFrame is sparse. That is, if there was no measurement corresponding to a particular index (say (3,2)), there won't be any row for it (3,2). I'd like to make it not sparse, by filling in these rows with p=0 and q=0. Continuing the example, if I do df.loc[3].loc[2], I want it to return p=0 q=0, not No Such Record (as it currently does).
Clarification: By "sparse", I mean it only in the sense I used it, that zero values are omitted. I'm not referring to anything in Pandas or Numpy internals.
Consider this df
data = {
(1, 0): dict(p=1, q=1),
(3, 2): dict(p=1, q=1),
(5, 4): dict(p=1, q=1),
(7, 6): dict(p=1, q=1),
}
df = pd.DataFrame(data).T
df
p q
1 0 1 1
3 2 1 1
5 4 1 1
7 6 1 1
Use reindex with fill_value=0 from a constructed pd.MultiIndex.from_product
mux = pd.MultiIndex.from_product([range(8), range(8)])
df.reindex(mux, fill_value=0)
p q
0 0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
1 0 1 1
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
2 0 0 0
1 0 0
2 0 0
3 0 0
response to comment
You can get min, max of index levels like this
def mn_mx(idx):
return idx.min(), idx.max()
mn0, mx0 = mn_mx(df.index.levels[0])
mn1, mx1 = mn_mx(df.index.levels[1])
mux = pd.MultiIndex.from_product([range(mn0, mx0 + 1), range(mn1, mx1 + 1)])
df.reindex(mux, fill_value=0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With