Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas cut: how to convert categorical labels to strings (otherwise cannot export to Excel)?

I use pandas.cut() to discretise a continuous variable into a range, and then group by the result.

After a lot of swearing because I couldn't figure out what was wrong, I have learnt that, if I don't supply custom labels to the cut() function, but rely on the default, then the output cannot be exported to excel. If I try this:

import pandas as pd
import numpy as np    

writer = pd.ExcelWriter('test.xlsx')
wk = writer.book.add_worksheet('Test')

df= df= pd.DataFrame(np.random.randint(1,10,(10000,5)), columns=['a','b','c','d','e'])
df['range'] = pd.cut( df['a'],[-np.inf,3,8,np.inf] )
grouped=df.groupby('range').sum()
grouped.to_excel(writer, 'Export')
writer.close()

I get:

raise TypeError("Unsupported type %s in write()" % type(token))
TypeError: Unsupported type <class 'pandas._libs.interval.Interval'> in write()
which it took me a while to decypher.

If instead I do assign labels:

df['range'] = pd.cut( df['a'],[-np.inf,3,8,np.inf], labels =['<3','3-8','>8'] )

then it all runs fine. Any suggestions on how to handle this without assigning custom labels? In the initial phase of my work I tend not to assign labels, because I still don't know how many bins I want - it's a trial and error approach, and assigning labels at each attempt would be time-consuming.

I am not sure if this can count as a bug, but at the very least it seems like a poorly documented annoyance!

like image 279
Pythonista anonymous Avatar asked Oct 16 '17 16:10

Pythonista anonymous


People also ask

How do you change strings to categorical pandas?

astype() method is used to cast a pandas object to a specified dtype. astype() function also provides the capability to convert any suitable existing column to categorical type. DataFrame. astype() function comes very handy when we want to case a particular column data type to another data type.

What does cut do in pandas?

cut() method in Python. Pandas cut() function is used to separate the array elements into different bins . The cut function is mainly used to perform statistical analysis on scalar data.

Can we convert DataFrame to string in Python?

Using DataFrame.You can convert the column “Fee” to a string by simply using DataFrame. apply(str) , for example df["Fee"]=df["Fee"]. apply(str) .

How do I cut a column in pandas?

To delete rows and columns from DataFrames, Pandas uses the “drop” function. To delete a column, or multiple columns, use the name of the column(s), and specify the “axis” as 1. Alternatively, as in the example below, the 'columns' parameter has been added in Pandas which cuts out the need for 'axis'.


1 Answers

Use astype(str):

writer = pd.ExcelWriter('test.xlsx')
wk = writer.book.add_worksheet('Test')

df= df= pd.DataFrame(np.random.randint(1,10,(10000,5)), columns=['a','b','c','d','e'])
df['range'] = pd.cut( df['a'],[-np.inf,3,8,np.inf] ).astype(str)
grouped=df.groupby('range').sum()
grouped.to_excel(writer, 'Export')
writer.close()

Output in excel:

range   a   b   c   d   e
(-inf, 3.0] 6798    17277   16979   17266   16949
(3.0, 8.0]  33150   28051   27551   27692   27719
(8.0, inf]  9513    5153    5318    5106    5412

enter image description here

like image 121
Scott Boston Avatar answered Oct 05 '22 07:10

Scott Boston