Based off of a previous question I asked, which @Andrie answered, I have a question about the usage of the cut
function and labels.
I'd like get summary statistics based on the range of number of times a user logs in.
Here is my data:
# Get random numbers
NumLogin <- round(runif(100,1,50))
# Set the login range
LoginRange <- cut(NumLogin,
c(0,1,3,5,10,15,20,Inf),
labels=c('1','2','3-5','6-10','11-15','16-20','20+')
)
Now I have my LoginRange, but I'm unsure how the cut
function actually works. I want to find users who have logged in 1 time, 2 times, 3-5 times, etc, while only including the user if they are in that range. Is the cut
function including 3 twice (In the 2 bucket and the 3-5 bucket)? If I look in my example, I can see a user who logged in 3 times, but they are cut
as '2'. I've looked at the documentation and every R
book I own, but no luck. What am I doing wrong?
Also - As a usage question - should I attach the LoginRange to my data frame? If so, what's the best way to do so?
DF <- data.frame(NumLogin, LoginRange)
?
Thanks
The cut command removes the selected data from its original position, while the copy command creates a duplicate; in both cases the selected data is kept in temporary storage (the clipboard). The data from the clipboard is later inserted wherever a paste command is issued.
The intervals defined by the cut()
function are (by default) closed on the right. To see what that means, try this:
cut(1:2, breaks=c(0,1,2))
# [1] (0,1] (1,2]
As you can see, the integer 1
gets included in the range (0,1]
, not in the range (1,2]
. It doesn't get double-counted, and for any input value falling outside of the bins you define, cut()
will return a value of NA
.
When dealing with integer-valued data, I tend to set break points between the integers, just to avoid tripping myself up. In fact, doing this with your data (as shown below), reveals that the 2nd and 3rd bins were actually incorrectly named, which illustrates the point quite nicely!
LoginRange <- cut(NumLogin,
c(0.5, 1.5, 3.5, 5.5, 10.5, 15.5, 20.5, Inf),
# c(0,1,3,5,10,15,20,Inf) + 0.5,
labels=c('1','2-3','4-5','6-10','11-15','16-20','20+')
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With