I use pandas.cut() to discretise a continuous variable into a range, and then group by the result.
After a lot of swearing because I couldn't figure out what was wrong, I have learnt that, if I don't supply custom labels to the cut() function, but rely on the default, then the output cannot be exported to excel. If I try this:
import pandas as pd
import numpy as np
writer = pd.ExcelWriter('test.xlsx')
wk = writer.book.add_worksheet('Test')
df= df= pd.DataFrame(np.random.randint(1,10,(10000,5)), columns=['a','b','c','d','e'])
df['range'] = pd.cut( df['a'],[-np.inf,3,8,np.inf] )
grouped=df.groupby('range').sum()
grouped.to_excel(writer, 'Export')
writer.close()
I get:
raise TypeError("Unsupported type %s in write()" % type(token))
TypeError: Unsupported type <class 'pandas._libs.interval.Interval'> in write()
which it took me a while to decypher.
If instead I do assign labels:
df['range'] = pd.cut( df['a'],[-np.inf,3,8,np.inf], labels =['<3','3-8','>8'] )
then it all runs fine. Any suggestions on how to handle this without assigning custom labels? In the initial phase of my work I tend not to assign labels, because I still don't know how many bins I want - it's a trial and error approach, and assigning labels at each attempt would be time-consuming.
I am not sure if this can count as a bug, but at the very least it seems like a poorly documented annoyance!
astype() method is used to cast a pandas object to a specified dtype. astype() function also provides the capability to convert any suitable existing column to categorical type. DataFrame. astype() function comes very handy when we want to case a particular column data type to another data type.
cut() method in Python. Pandas cut() function is used to separate the array elements into different bins . The cut function is mainly used to perform statistical analysis on scalar data.
Using DataFrame.You can convert the column “Fee” to a string by simply using DataFrame. apply(str) , for example df["Fee"]=df["Fee"]. apply(str) .
To delete rows and columns from DataFrames, Pandas uses the “drop” function. To delete a column, or multiple columns, use the name of the column(s), and specify the “axis” as 1. Alternatively, as in the example below, the 'columns' parameter has been added in Pandas which cuts out the need for 'axis'.
Use astype(str)
:
writer = pd.ExcelWriter('test.xlsx')
wk = writer.book.add_worksheet('Test')
df= df= pd.DataFrame(np.random.randint(1,10,(10000,5)), columns=['a','b','c','d','e'])
df['range'] = pd.cut( df['a'],[-np.inf,3,8,np.inf] ).astype(str)
grouped=df.groupby('range').sum()
grouped.to_excel(writer, 'Export')
writer.close()
Output in excel:
range a b c d e
(-inf, 3.0] 6798 17277 16979 17266 16949
(3.0, 8.0] 33150 28051 27551 27692 27719
(8.0, inf] 9513 5153 5318 5106 5412
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With