Here is the snippet:
test = pd.DataFrame({'days': [0,31,45]})
test['range'] = pd.cut(test.days, [0,30,60])
Output:
days range
0 0 NaN
1 31 (30, 60]
2 45 (30, 60]
I am surprised that 0 is not in (0, 30], what should I do to categorize 0 as (0, 30]?
Use cut when you need to segment and sort data values into bins. This function is also useful for going from a continuous variable to a categorical variable. For example, cut could convert ages to groups of age ranges. Supports binning into an equal number of bins, or a pre-specified array of bins.
To delete rows and columns from DataFrames, Pandas uses the “drop” function. To delete a column, or multiple columns, use the name of the column(s), and specify the “axis” as 1. Alternatively, as in the example below, the 'columns' parameter has been added in Pandas which cuts out the need for 'axis'.
Use pd. cut() for binning data based on the range of possible values. Use pd. qcut() for binning data based on the actual distribution of values.
Slicing a DataFrame in Pandas includes the following steps:Ensure Python is installed (or install ActivePython) Import a dataset. Create a DataFrame. Slice the DataFrame.
test['range'] = pd.cut(test.days, [0,30,60], include_lowest=True) print (test) days range 0 0 (-0.001, 30.0] 1 31 (30.0, 60.0] 2 45 (30.0, 60.0]
See difference:
test = pd.DataFrame({'days': [0,20,30,31,45,60]}) test['range1'] = pd.cut(test.days, [0,30,60], include_lowest=True) #30 value is in [30, 60) group test['range2'] = pd.cut(test.days, [0,30,60], right=False) #30 value is in (0, 30] group test['range3'] = pd.cut(test.days, [0,30,60]) print (test) days range1 range2 range3 0 0 (-0.001, 30.0] [0, 30) NaN 1 20 (-0.001, 30.0] [0, 30) (0, 30] 2 30 (-0.001, 30.0] [30, 60) (0, 30] 3 31 (30.0, 60.0] [30, 60) (30, 60] 4 45 (30.0, 60.0] [30, 60) (30, 60] 5 60 (30.0, 60.0] NaN (30, 60]
Or use numpy.searchsorted
, but values of days
has to be sorted:
arr = np.array([0,30,60]) test['range1'] = arr.searchsorted(test.days) test['range2'] = arr.searchsorted(test.days, side='right') - 1 print (test) days range1 range2 0 0 0 0 1 20 1 0 2 30 1 1 3 31 2 1 4 45 2 1 5 60 2 2
pd.cut
documentation
Include parameter right=False
test = pd.DataFrame({'days': [0,31,45]}) test['range'] = pd.cut(test.days, [0,30,60], right=False) test days range 0 0 [0, 30) 1 31 [30, 60) 2 45 [30, 60)
You can use labels to pd.cut() as well. The following example contains the grade of students in the range from 0-10. We're adding a new column called 'grade_cat' to categorize the grades.
bins represent the intervals: 0-4 is one interval, 5-6 is one interval, and so on The corresponding labels are "poor", "normal", etc
bins = [0, 4, 6, 10]
labels = ["poor","normal","excellent"]
student['grade_cat'] = pd.cut(student['grade'], bins=bins, labels=labels)
A sample of how the .cut works
s=pd.Series([168,180,174,190,170,185,179,181,175,169,182,177,180,171])
pd.cut(s,3)
#To add labels to bins
pd.cut(s,3,labels=["Small","Medium","Large"])
This can be used directly on a range
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With