In answering this question, I found that after using melt
on a pandas dataframe, a column that was previously an ordered Categorical dtype becomes an object
. Is this intended behaviour?
Note: not looking for a solution, just wondering if there is any reason for this behaviour or if it's not intended behavior.
Example:
Using the following dataframe df
:
Cat L_1 L_2 L_3
0 A 1 2 3
1 B 4 5 6
2 C 7 8 9
df['Cat'] = pd.Categorical(df['Cat'], categories = ['C','A','B'], ordered=True)
# As you can see `Cat` is a category
>>> df.dtypes
Cat category
L_1 int64
L_2 int64
L_3 int64
dtype: object
melted = df.melt('Cat')
>>> melted
Cat variable value
0 A L_1 1
1 B L_1 4
2 C L_1 7
3 A L_2 2
4 B L_2 5
5 C L_2 8
6 A L_3 3
7 B L_3 6
8 C L_3 9
Now, if I look at Cat
, it's become an object:
>>> melted.dtypes
Cat object
variable object
value int64
dtype: object
Is this intended?
We can also do the reverse of the melt operation which is also called as Pivoting. In Pivoting or Reverse Melting, we convert a column with multiple values into several columns of their own. The pivot() method on the dataframe takes two main arguments index and columns .
Pandas melt() function is used to change the DataFrame format from wide to long. It's used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns - variable and value.
melt() function is useful to message a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are unpivoted to the row axis, leaving just two non-identifier columns, variable and value.
There are many ways in which conversion can be done, one such way is by using Pandas' integrated cut-function. Pandas' cut function is a distinguished way of converting numerical continuous data into categorical data.
In source code . 0.22.0(My old version)
for col in id_vars:
mdata[col] = np.tile(frame.pop(col).values, K)
mcolumns = id_vars + var_name + [value_name]
Which will return the datatype object with np.tile
.
It has been fixed in 0.23.4(After I update my pandas
)
df.melt('Cat')
Out[6]:
Cat variable value
0 A L_1 1
1 B L_1 4
2 C L_1 7
3 A L_2 2
4 B L_2 5
5 C L_2 8
6 A L_3 3
7 B L_3 6
8 C L_3 9
df.melt('Cat').dtypes
Out[7]:
Cat category
variable object
value int64
dtype: object
More info how it fixed :
for col in id_vars:
id_data = frame.pop(col)
if is_extension_type(id_data): # here will return True , then become concat not np.tile
id_data = concat([id_data] * K, ignore_index=True)
else:
id_data = np.tile(id_data.values, K)
mdata[col] = id_data
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With