This code was working until I upgrade my python 2.x to 3.x. I have a df consisting of 3 columns ipk1, ipk2, ipk3. ipk1, ipk2, ipk3 consisting of float numbers 0 - 4.0, I would like to bin them into string.
The data looks something like this:
ipk1 ipk2 ipk3 ipk4 ipk5 jk
0 3.25 3.31 3.31 3.31 3.34 P
1 3.37 3.33 3.36 3.33 3.41 P
2 3.41 3.47 3.59 3.55 3.60 P
3 3.23 3.10 3.05 2.98 2.97 L
4 3.24 3.40 3.22 3.23 3.25 L
on python 2.x this code works but after I upgrade it into python 3 it isn't. Is there any other way to bin it into string ? I have tried using while it also not help anything.
train1.loc[train1['ipk1'] > 3.6, 'ipk1'] = 'A',
train1.loc[(train1['ipk1']>3.2) & (train1['ipk1']<=3.6),'ipk1']='B',
train1.loc[(train1['ipk1']>2.8) & (train1['ipk1']<=3.2),'ipk1']='C',
train1.loc[(train1['ipk1']>2.4) & (train1['ipk1']<=2.8),'ipk1']='D',
train1.loc[(train1['ipk1']>2.0) & (train1['ipk1']<=2.4),'ipk1']='E',
train1.loc[(train1['ipk1']>1.6) & (train1['ipk1']<=2.0),'ipk1']='F',
train1.loc[(train1['ipk1']>1.2) & (train1['ipk1']<=1.6),'ipk1']='G',
train1.loc[train1['ipk1'] <= 1.2, 'ipk1'] = 'H'
The error I receive:
TypeError: '>' not supported between instances of 'str' and 'float'
My expected output:
ipk1 ipk2 ipk3 ipk4 ipk5 jk
0 B 3.31 3.31 3.31 3.34 P
1 B 3.33 3.36 3.33 3.41 P
2 B 3.47 3.59 3.55 3.60 P
3 B 3.10 3.05 2.98 2.97 L
4 B 3.40 3.22 3.23 3.25 L
In Python pandas binning by distance is achieved by means of the cut() function. We group values related to the column Cupcake into three groups: small, medium and big. In order to do it, we need to calculate the intervals within each group falls.
To delete rows and columns from DataFrames, Pandas uses the “drop” function. To delete a column, or multiple columns, use the name of the column(s), and specify the “axis” as 1. Alternatively, as in the example below, the 'columns' parameter has been added in Pandas which cuts out the need for 'axis'.
The pandas documentation describes qcut as a “Quantile-based discretization function.” This basically means that qcut tries to divide up the underlying data into equal sized bins. The function defines the bins using percentiles based on the distribution of the data, not the actual numeric edges of the bins.
This is a good use case for pandas.cut
:
bins = [-np.inf, 1.2, 1.6, 2.0, 2.4, 2.8, 3.2, 3.6, np.inf]
labels = ['H', 'G', 'F', 'E', 'D', 'C', 'B', 'A']
df['ipk1'] = pd.cut(df['ipk1'], bins=bins, labels=labels)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With