Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is it ok to have an index with lists as values but not ok for columns?

Consider the numpy.array i

i = np.empty((1,), dtype=object)
i[0] = [1, 2]

i

array([list([1, 2])], dtype=object)

Example 1
index

df = pd.DataFrame([1], index=i)
df

        0
[1, 2]  1

Example 2
columns
But

df = pd.DataFrame([1], columns=i)

Leads to this when I display it

df
TypeError: unhashable type: 'list'

However, df.T works!?


Question
Why is it necessary for index values to be hashable in a column context but not in an index context? And why only when it's displayed?

like image 226
piRSquared Avatar asked Jun 29 '17 23:06

piRSquared


1 Answers

This is because of how pandas internally determines the string representation of the DataFrame object. Essentially, the difference between column labels and index labels here is that the column determines the format of the string representation (as the column could be a float, int, etc.).

The error thus happens because pandas stores a separate formatter object for each column in a dictionary and this object is retrieved using the column name. Specifically, the line that triggers the error is https://github.com/pandas-dev/pandas/blob/d1accd032b648c9affd6dce1f81feb9c99422483/pandas/io/formats/format.py#L420

like image 163
jhansen Avatar answered Oct 06 '22 03:10

jhansen