Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Categorical dtype changes after using melt

Tags:

python

pandas

In answering this question, I found that after using melt on a pandas dataframe, a column that was previously an ordered Categorical dtype becomes an object. Is this intended behaviour?

Note: not looking for a solution, just wondering if there is any reason for this behaviour or if it's not intended behavior.

Example:

Using the following dataframe df:

  Cat  L_1  L_2  L_3
0   A    1    2    3
1   B    4    5    6
2   C    7    8    9

df['Cat'] = pd.Categorical(df['Cat'], categories = ['C','A','B'], ordered=True)

# As you can see `Cat` is a category
>>> df.dtypes
Cat    category
L_1       int64
L_2       int64
L_3       int64
dtype: object

melted = df.melt('Cat')

>>> melted
  Cat variable  value
0   A      L_1      1
1   B      L_1      4
2   C      L_1      7
3   A      L_2      2
4   B      L_2      5
5   C      L_2      8
6   A      L_3      3
7   B      L_3      6
8   C      L_3      9

Now, if I look at Cat, it's become an object:

>>> melted.dtypes
Cat         object
variable    object
value        int64
dtype: object

Is this intended?

like image 746
sacuL Avatar asked Oct 28 '22 03:10

sacuL


People also ask

What is the reverse of melt in Pandas?

We can also do the reverse of the melt operation which is also called as Pivoting. In Pivoting or Reverse Melting, we convert a column with multiple values into several columns of their own. The pivot() method on the dataframe takes two main arguments index and columns .

What does melting a DataFrame do?

Pandas melt() function is used to change the DataFrame format from wide to long. It's used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns - variable and value.

What does the melt function do?

melt() function is useful to message a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are unpivoted to the row axis, leaving just two non-identifier columns, variable and value.

How do you convert numerical data to categorical data in Pandas?

There are many ways in which conversion can be done, one such way is by using Pandas' integrated cut-function. Pandas' cut function is a distinguished way of converting numerical continuous data into categorical data.


1 Answers

In source code . 0.22.0(My old version)

 for col in id_vars:
        mdata[col] = np.tile(frame.pop(col).values, K)
     mcolumns = id_vars + var_name + [value_name]

Which will return the datatype object with np.tile.

It has been fixed in 0.23.4(After I update my pandas)

df.melt('Cat')
Out[6]: 
  Cat variable  value
0   A      L_1      1
1   B      L_1      4
2   C      L_1      7
3   A      L_2      2
4   B      L_2      5
5   C      L_2      8
6   A      L_3      3
7   B      L_3      6
8   C      L_3      9
df.melt('Cat').dtypes
Out[7]: 
Cat         category
variable      object
value          int64
dtype: object

More info how it fixed :

for col in id_vars:
    id_data = frame.pop(col)
    if is_extension_type(id_data): # here will return True , then become concat not np.tile
        id_data = concat([id_data] * K, ignore_index=True)
    else:
        id_data = np.tile(id_data.values, K)
    mdata[col] = id_data
like image 192
BENY Avatar answered Nov 15 '22 07:11

BENY