I'm sensing some weird pandas
behavior here. I have a dataframe that looks like
df = pd.DataFrame(columns=['Col 1', 'Col 2', 'Col 3'],
index=[('1', 'a'), ('2', 'a'), ('1', 'b'), ('2', 'b')])
In [14]: df
Out[14]:
Col 1 Col 2 Col 3
(1, a) NaN NaN NaN
(2, a) NaN NaN NaN
(1, b) NaN NaN NaN
(2, b) NaN NaN NaN
I can set the value of an arbitrary element
In [15]: df['Col 2'].loc[('1', 'b')] = 6
In [16]: df
Out[16]:
Col 1 Col 2 Col 3
(1, a) NaN NaN NaN
(2, a) NaN NaN NaN
(1, b) NaN 6 NaN
(2, b) NaN NaN NaN
But when I go to reference the element that I just set using the same syntax, I get
In [17]: df['Col 2'].loc[('1', 'b')]
KeyError: 'the label [1] is not in the [index]'
Can someone tell me what I'm doing wrong or why this behavior occurs? Am I simply not allowed to set the index as a multi-element tuple?
Edit
Apparently, wrapping the tuple index in a list works.
In [38]: df['Col 2'].loc[[('1', 'b')]]
Out[38]:
(1, b) 6
Name: Col 2, dtype: object
Although I'm still getting some weird behavior in my actual use case so it'd be nice to know if this is not recommended usage.
Tuple IndexingWe can access elements in a tuple in the same way as we do in lists and strings. Hence, we can access elements simply by indexing and slicing.
Elements in a tuple can be accessed in the same way that we do in lists and strings. As a result, we can easily access elements by indexing. For this, we use the Python tuple index() function that is a tuple subclass method that returns the index value of the specified element in the tuple.
Custom Index in Pandas Series This creates a Series from the list myseriesdata and sets the index equal to the values of myseriesdata. The result is printed below. Now you can search with both integers and strings. Strings are referred to as labels when referencing indexes.
Your tuple in the selection brackets is seen as a sequence containing the elements you want to retrieve. It's like you would have passed ['1', 'b']
as argument. Thus the KeyError message: pandas tries to find the key '1'
and obviously doesn't find it.
That's why it works when you add additional brackets, as now the argument becomes a sequence of one element - your tuple.
You should avoid dealing with ambiguities around list and tuple arguments in selection. The behavior can be also different depending on the index being a simple index or a multiindex.
In any case, if you ask about recommendations here, the one I see is that you should try to not build simple indexes made of tuples: pandas will work better and will be more powerful to use if you actually build a multiindex instead:
df = pd.DataFrame(columns=['Col 1', 'Col 2', 'Col 3'],
index=pd.MultiIndex.from_tuples([('1', 'a'), ('2', 'a'), ('1', 'b'), ('2', 'b')]))
df['Col 2'].loc[('1', 'b')] = 6
df['Col 2'].loc[('1', 'b')]
Out[13]: 6
df
Out[14]:
Col 1 Col 2 Col 3
1 a NaN NaN NaN
2 a NaN NaN NaN
1 b NaN 6 NaN
2 b NaN NaN NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With