I am trying to use the cut function to create age intervals. Unfortunately, I receive NAs for values that match the lower end of the first break.
For example:
AGE <- sample(18:50, 100, replace = TRUE)
AGE_GROUPS <- cut(AGE, breaks = c(18, 27, 36, 45))
DF <- data.frame(AGE, AGE_GROUPS)
For all the values where AGE is 18 and above 45, I receive NA in the AGE_GROUPS variable. How can I make sure that the lowest bracket in AGE_GROUPS includes 18 and how can I make sure that the highest bracket includes all values >= 45?
cut() function in R Language is used to divide a numeric vector into different ranges.
The cut command removes the selected data from its original position, while the copy command creates a duplicate; in both cases the selected data is kept in temporary storage (the clipboard). The data from the clipboard is later inserted wherever a paste command is issued.
In R, missing values are represented by the symbol NA (not available). Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number). Unlike SAS, R uses the same symbol for character and numeric data.
Breaks isn't just the intermediate breaks, it is the endpoints too. You can make sure you get everything with
breaks = c(-Inf, 18, 27, 36, 45, Inf)
A little more conservatively, you could use
breaks = c(0, 18, 27, 36, 45, 120)
which can be useful for catching outlier coding errors. You may also want include.lowest = TRUE
. See ?cut
for examples.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With