I have a multi-index dataframe (multindex on the rows) and I want to insert a row at a certain row number. This question has been aksed before(How to insert a row in a Pandas multiindex dataframe?) but I have a bit different problem I want to solve
In fact. I solved it but it is so ugly that I want to ask if there is a better way of doing this.
Since there is no insert_row function in pandas I thought to insert a row by first copying my data frame up to a certain row to a new frame using a slice, then append the row I want to insert and then append the rest of the original data frame. In fact, this approach work. Here is an example
First I create an empty data frame with
pipeinfo_index = pd.MultiIndex.from_tuples([tuple([None, None])], names=["Label", "Side"])
pipeinfoDF = pd.DataFrame(index=pipeinfo_index, columns=[])
pipeinfoDF.drop(np.nan, level='Label', inplace=True)
for i in range(4):
label = "label{}".format(i)
pipeinfoDF.ix[(label, "A"),"COL1"] = i
pipeinfoDF.ix[(label, "B"),"COL1"] = 2*i+1
which looks like
COL1
Label Side
label0 A 0
B 1
label1 A 1
B 3
label2 A 2
B 5
label3 A 3
I want to add a row between label1 and label2 such that I get a new dataframe like
COL1
Label Side
label0 A 0
B 1
label1 A 1
B 3
oker5 A 10
B 30
label2 A 2
B 5
label3 A 3
In my first approach I do this following my strategy described above
# copy first half to temporary data frame
part1= pipeinfoDF.ix[:"label1"].copy()
# copy second half to temporary data frame
part2= pipeinfoDF.ix["label2":].copy()
# append new row to insert to first part
part1.ix[("oker5", "B"),"COL1"] = 10
part1.ix[("oker5", "A"),"COL2"] = 30
# append second half of row to new data frame
for label, side in part2.index:
print("adding {} {}".format(label, side))
part1.ix[(label, side),:] = part2.ix[(label, side),:]
# copy the new data frame to the initial data frame
pipeinfoDF = part1.copy()
In fact, this works; if I print out pipeinfoDF I get exactly the data frame shown above. However, here is the problem: if I want to do this again (because I want to add more than one row to this initial dataframe), I get an error message. Even an further slice of the data frame will result in an error For instance, doing
part3= pipeinfoDF2.loc[:"oker5"].copy()
results in an error:
Traceback (most recent call last):
File "C:/Apps/JetBrains/PyCharm Community Edition 4.5.4/jre/jre/bin/DataEelco/.PyCharm/config/scratches/scratch", line 96, in <module>
part3= pipeinfoDF2.loc[:"oker5"].copy()
File "C:\Apps\Anaconda\Anaconda2\envs\py34\lib\site-packages\pandas\core\indexing.py", line 1189, in __getitem__
return self._getitem_axis(key, axis=0)
File "C:\Apps\Anaconda\Anaconda2\envs\py34\lib\site-packages\pandas\core\indexing.py", line 1304, in _getitem_axis
return self._get_slice_axis(key, axis=axis)
File "C:\Apps\Anaconda\Anaconda2\envs\py34\lib\site-packages\pandas\core\indexing.py", line 1211, in _get_slice_axis
slice_obj.step)
File "C:\Apps\Anaconda\Anaconda2\envs\py34\lib\site-packages\pandas\core\index.py", line 2340, in slice_indexer
start_slice, end_slice = self.slice_locs(start, end, step=step, kind=kind)
File "C:\Apps\Anaconda\Anaconda2\envs\py34\lib\site-packages\pandas\core\index.py", line 4990, in slice_locs
return super(MultiIndex, self).slice_locs(start, end, step, kind=kind)
File "C:\Apps\Anaconda\Anaconda2\envs\py34\lib\site-packages\pandas\core\index.py", line 2490, in slice_locs
end_slice = self.get_slice_bound(end, 'right', kind)
File "C:\Apps\Anaconda\Anaconda2\envs\py34\lib\site-packages\pandas\core\index.py", line 4961, in get_slice_bound
return self._partial_tup_index(label, side=side)
File "C:\Apps\Anaconda\Anaconda2\envs\py34\lib\site-packages\pandas\core\index.py", line 4996, in _partial_tup_index
(len(tup), self.lexsort_depth))
KeyError: 'Key length (1) was greater than MultiIndex lexsort depth (0)'
I found that this error is related to the fact that I want to add rows of data to the new data frame which already exist
If I do
print(part1)
I get
COL1
Label Side
label0 A 0
B 1
label1 A 1
B 3
But if I do
print(part1.index)
I see
MultiIndex(levels=[['label0', 'label1', 'label2', 'label3'], ['A', 'B']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=['Label', 'Side'])
In other words, if I slice the rows up to label1 I indeed get a dataframe which looks like this slice, but the multiindex levels still contain the higher levels to. Apparently if I try to add those level later again by copying the rows of part2df this is going to mess up everything (and I can not slice anymore)
The solution I found works, but is ugly. I stead of slicing the initial data frame I construct part1 and part2 starting from an empty data frame using the following routine
def get_slice(dataframe, start_from_label=None, end_at_label=None):
# create a empty data frame and initialise with rows copy from the dataframe starting from
# start_from_label and ending at end_at_label
mi = pd.MultiIndex.from_tuples([tuple([None, None])], names=dataframe.index.names)
df_new = pd.DataFrame(index=mi, columns=[])
df_new.drop(np.nan, level='Label', inplace=True)
insert = False
for label, df in dataframe.groupby(level=0):
side_list = df.index.get_level_values('Side')
if start_from_label is None or label == start_from_label:
insert = True
if insert:
for side in side_list:
for col in dataframe.columns:
df_new.ix[(label, side),col] = dataframe.ix[(label, side),col]
if end_at_label is not None and label == end_at_label:
break
return df_new
In the main code, at the place where I create the slice I now do
# part1= pipeinfoDF.ix[:"label1"].copy()
part1 = get_slice(pipeinfoDF, end_at_label="label1")
# part2= pipeinfoDF.ix["label2":].copy()
part2 = get_slice(pipeinfoDF, start_from_label="label2")
All the rest of the code remains the same. The difference is that part1 and part2 have a clean multindex field. If I print the index I get
MultiIndex(levels=[['label0', 'label1'], ['A', 'B']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=['Label', 'Side'])
If I know slice the new dataframe again, this works fine:
part3= pipeinfoDF2.loc[:"oker5"].copy()
print(part3)
I get
COL1
Label Side
label0 A 0
B 1
label1 A 1
B 3
oker5 B 30
A 10
As you can see, the next slice on the resulting data frame does not raise the KeyError any more, which means I can recursively add more rows to this dataframe
My question is now the following: How can I use the first slicing method and still be able to slice the resulting data frame again? The solution I found now basically implements slicing by creating an empty data frame and copying the rows one by one, but I would say it should be possible in a better way. Hopefully somebody can give me some advice
Eelco
Edit on my comment below
Both answers are correct. In fact, in my real situation it was a bit more complicated. In the example, all the Side labels are order A-B, but in reality this can change and also I want to have the freedom to impose a different order. If you take that into account, the first anwer with the unstack/stack does not work. After the unstack, I can not impose a different order A/B or B/A anymore. Therefore, I must use the option with dropna.
Example, my dataframe is slightly different
for i in range(4):
label = "label{}".format(i)
if i!=2:
l1 = "A"
l2 = "B"
else:
l1 = "B"
l2 = "A"
pipeinfoDF.ix[(label, l1),"COL1"] = i
pipeinfoDF.ix[(label, l2),"COL1"] = 2*i+1
pipeinfoDF.ix[(label, l1),"COL2"] = 10*i
pipeinfoDF.ix[(label, l2),"COL2"] = 10*(2*i+1)
which looks like
COL1 COL2
Label Side
label0 A 0 0
B 1 10
label1 A 1 10
B 3 30
label2 B 2 20
A 5 50
label3 A 3 30
B 7 70
Now, the unstack method does not allow to insert a row with for instance B first and then A, because after unstack/stack, the order is always A/B. This is not what I want.
So the other solution is actually what I need. There is one slight issue, perhaps it can be solved too:-)
My method is now:
# create the lines to add
newdata = pd.DataFrame(index=pipeinfo_index, columns=[])
newdata.ix[('oker8', "B"), "COL1"] = 10
newdata.ix[('oker8', "A"lot, it ), "COL1"] = 30
newdata.ix[('oker8', "B"), "COL2"] = 108
newdata.ix[('oker8', "A"), "COL2"] = 300
newdata2 = pd.DataFrame(index=pipeinfo_index, columns=[])
newdata2.ix[('oker9', "A"), "COL1"] = 20
newdata2.ix[('oker9', "B"), "COL1"] = 50
newdata2.ix[('oker9', "A"), "COL2"] = 2lot, it 023
newdata2.ix[('oker9', "B"), "COL2"] = 5320
#get the indices to add the row
inx1=np.where(pipeinfoDF.reset_index().Label.values=='label1'); inx1=inx1[0][0]+2
inx2=np.where(pipeinfoDF.reset_index().Label.values=='label2'); inx2=inx2[0][0]
#insert the first data row
pipeinfoDF = pd.concat([pipeinfoDF.ix[:inx1], newdata, pipeinfoDF.ix[inx2:]])
pipeinfoDF.drop(np.nan, level='Label', inplace=True)
which correctly gives
COL1 COL2
Label Side
label0 A 0 0
B 1 10
label1 A 1 10
B 3 30
oker8 B 10 108
A 30 300
label2 B 2 20
A 5 50
label3 A 3 30
B 7 70
and a second row can recursively be added as
inx1=np.where(pipeinfoDF.reset_index().Label.values=='label2'); inx1=inx1[0][0]+2
inx2=np.where(pipeinfoDF.reset_index().Label.values=='label3'); inx2=inx2[0][0]
pipeinfoDF = pd.concat([pipeinfoDF.ix[:inx1], newdata2, pipeinfoDF.ix[inx2:]])
pipeinfoDF.drop(np.nan, level='Label', inplace=True)
print(pipeinfoDF)
giving
COL1 COL2
Label Side
label0 A 0 0
B 1 10
label1 A 1 10
B 3 30
oker8 B 10 108
A 30 300
label2 B 2 20
A 5 50
oker9 A 20 2023
B 50 5320
label3 A 3 30
B 7 70
So this works. The only thing I don't understand is that slicing using labels instead of indices is not possible anaymore once you have started with slicing with indices. So for instance if you do
print(pipeinfoDF.ix[:'label2'])
you get the KeyError again, whereas at the start of the script, before the first slice with the index is made, the slicing with the label works just fine: you correctly would get
COL1 COL2
Label Side
label0 A 0 0
B 1 10
label1 A 1 10
B 3 30
label2 B 2 20
A 5 50
I am not sure if this is going to give me problems later on. But perhaps there is a way to fix this as well ? Well, so far: thanks a lot for both anwers!
Eelco
If you just want to insert a new set of data at a certain position try the following. With drop you recieve a new object, there will be no KeyError issue anymore.
# create new dataframe based on pipeinfo_index
newdata = pd.DataFrame(index=pipeinfo_index, columns=[])
newdata.ix[('oaker', "A"), "COL1"] = 10
newdata.ix[('oaker', "B"), "COL1"] = 30
idx = getindexof('label1')
pipeinfoDF = pd.concat([pipeinfoDF.ix[:idx], newdata]) #, pipeinfoDF.ix[idx:]])
# drop the NaN and recieve a new object
pipeinfoDF.drop(np.nan, level='Label', inplace=True)
pipeinfoDF.index
MultiIndex(levels=[[u'label0', u'label1', u'oaker'], [u'A', u'B']],
labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]],
names=[u'Label', u'Side'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With