Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame with tuple of strings as index

I'm sensing some weird pandas behavior here. I have a dataframe that looks like

df = pd.DataFrame(columns=['Col 1', 'Col 2', 'Col 3'],
                  index=[('1', 'a'), ('2', 'a'), ('1', 'b'), ('2', 'b')])

In [14]: df
Out[14]:
       Col 1 Col 2 Col 3
(1, a)   NaN   NaN   NaN
(2, a)   NaN   NaN   NaN
(1, b)   NaN   NaN   NaN
(2, b)   NaN   NaN   NaN

I can set the value of an arbitrary element

In [15]: df['Col 2'].loc[('1', 'b')] = 6

In [16]: df
Out[16]:
       Col 1 Col 2 Col 3
(1, a)   NaN   NaN   NaN
(2, a)   NaN   NaN   NaN
(1, b)   NaN     6   NaN
(2, b)   NaN   NaN   NaN

But when I go to reference the element that I just set using the same syntax, I get

In [17]: df['Col 2'].loc[('1', 'b')]
KeyError: 'the label [1] is not in the [index]'

Can someone tell me what I'm doing wrong or why this behavior occurs? Am I simply not allowed to set the index as a multi-element tuple?

Edit

Apparently, wrapping the tuple index in a list works.

In [38]: df['Col 2'].loc[[('1', 'b')]]
Out[38]:
(1, b)    6
Name: Col 2, dtype: object

Although I'm still getting some weird behavior in my actual use case so it'd be nice to know if this is not recommended usage.

like image 910
lanery Avatar asked Oct 21 '16 22:10

lanery


People also ask

Can indexing be done in tuple?

Tuple IndexingWe can access elements in a tuple in the same way as we do in lists and strings. Hence, we can access elements simply by indexing and slicing.

Can we access tuple using index in Python?

Elements in a tuple can be accessed in the same way that we do in lists and strings. As a result, we can easily access elements by indexing. For this, we use the Python tuple index() function that is a tuple subclass method that returns the index value of the specified element in the tuple.

Can index be a string pandas?

Custom Index in Pandas Series This creates a Series from the list myseriesdata and sets the index equal to the values of myseriesdata. The result is printed below. Now you can search with both integers and strings. Strings are referred to as labels when referencing indexes.


1 Answers

Your tuple in the selection brackets is seen as a sequence containing the elements you want to retrieve. It's like you would have passed ['1', 'b'] as argument. Thus the KeyError message: pandas tries to find the key '1' and obviously doesn't find it.

That's why it works when you add additional brackets, as now the argument becomes a sequence of one element - your tuple.

You should avoid dealing with ambiguities around list and tuple arguments in selection. The behavior can be also different depending on the index being a simple index or a multiindex.

In any case, if you ask about recommendations here, the one I see is that you should try to not build simple indexes made of tuples: pandas will work better and will be more powerful to use if you actually build a multiindex instead:

df = pd.DataFrame(columns=['Col 1', 'Col 2', 'Col 3'],
                  index=pd.MultiIndex.from_tuples([('1', 'a'), ('2', 'a'), ('1', 'b'), ('2', 'b')]))

df['Col 2'].loc[('1', 'b')] = 6

df['Col 2'].loc[('1', 'b')]
Out[13]: 6

df
Out[14]: 
    Col 1 Col 2 Col 3
1 a   NaN   NaN   NaN
2 a   NaN   NaN   NaN
1 b   NaN     6   NaN
2 b   NaN   NaN   NaN
like image 139
Zeugma Avatar answered Sep 23 '22 05:09

Zeugma