Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cut function returns NA for intervals

Tags:

r

cut

I am trying to use the cut function to create age intervals. Unfortunately, I receive NAs for values that match the lower end of the first break.

For example:

AGE <- sample(18:50, 100, replace = TRUE)
AGE_GROUPS <- cut(AGE, breaks = c(18, 27, 36, 45))
DF <- data.frame(AGE, AGE_GROUPS)

For all the values where AGE is 18 and above 45, I receive NA in the AGE_GROUPS variable. How can I make sure that the lowest bracket in AGE_GROUPS includes 18 and how can I make sure that the highest bracket includes all values >= 45?

like image 552
Tea Tree Avatar asked Dec 13 '17 20:12

Tea Tree


People also ask

What does cut () do in R?

cut() function in R Language is used to divide a numeric vector into different ranges.

What is cut function?

The cut command removes the selected data from its original position, while the copy command creates a duplicate; in both cases the selected data is kept in temporary storage (the clipboard). The data from the clipboard is later inserted wherever a paste command is issued.

Is NA function in R?

In R, missing values are represented by the symbol NA (not available). Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number). Unlike SAS, R uses the same symbol for character and numeric data.


1 Answers

Breaks isn't just the intermediate breaks, it is the endpoints too. You can make sure you get everything with

breaks = c(-Inf, 18, 27, 36, 45, Inf)

A little more conservatively, you could use

breaks = c(0, 18, 27, 36, 45, 120)

which can be useful for catching outlier coding errors. You may also want include.lowest = TRUE. See ?cut for examples.

like image 105
Gregor Thomas Avatar answered Sep 20 '22 07:09

Gregor Thomas