I have a set of data in which I need to code values of certain variables (numeric) into 3 classes.
My data set is similar to this but has 60 more variables:
anim <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
wt <- c(181,179,180.5,201,201.5,245,246.4,189.3,301,354,369,205,199,394,231.3)
data <- data.frame(anim,wt)
> data
anim wt
1 1 181.0
2 2 179.0
3 3 180.5
4 4 201.0
5 5 201.5
6 6 245.0
7 7 246.4
8 8 189.3
9 9 301.0
10 10 354.0
11 11 369.0
12 12 205.0
13 13 199.0
14 14 394.0
15 15 231.3
I need to code values of the variable "wt" up into 3 classes: (wt >= 179 & wt < 200) = 1; (wt >= 200 & wt < 300) = 2; (wt > 300) = 3
which should give me this
> data2
anim wt SWT
1 1 181.0 1
2 2 179.0 1
3 3 180.5 1
4 4 201.0 2
5 5 201.5 2
6 6 245.0 2
7 7 246.4 2
8 8 189.3 1
9 9 301.0 3
10 10 354.0 3
11 11 369.0 3
12 12 205.0 2
13 13 199.0 1
14 14 394.0 3
15 15 231.3 2
To create a categorical variable from the existing column, we use an if-else statement within the factor() function and give a value to a column if a certain condition is true otherwise give another value.
You can use the cut() function in R to create a categorical variable from a continuous one. Note that breaks specifies the values to split the continuous variable on and labels specifies the label to give to the values of the new categorical variable.
The variables can be assigned values using leftward, rightward and equal to operator. The values of the variables can be printed using print() or cat() function. The cat() function combines multiple items into a continuous print output.
In R, you can convert multiple numeric variables to factor using lapply function. The lapply function is a part of apply family of functions. They perform multiple iterations (loops) in R. In R, categorical variables need to be set as factor variables.
The cut
method as outlined by @Greg is probably what you want here. One thing to note is that cut
returns a factor by default, which you can suppress by supplying labels = FALSE
to return the integer values:
cut(data$wt, c(178, 200, 300, Inf), labels = FALSE)
Alternatively, if your cutting does not lend itself to natural breaks, you can use ifelse()
. You can "nest" the ifelse statements similar to Excel. I use "with" to cut down on the typing needed:
data$group2 <- with(data, ifelse(wt >= 179 & wt < 200, 1,
ifelse(wt >= 200 & wt < 300, 2, 3))
)
You can try cut
anim <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
wt <-c(181,179,180.5,201,201.5,245,246.4,
189.3,301,354,369,205,199,394,231.3)
data <- data.frame(anim,wt)
EDIT: fixed group - right = FALSE, got rid of split example.
group = cut(data$wt, c(178, 200, 300, Inf), right=FALSE)
data$swt = as.numeric(group)
data
anim wt swt
1 1 181.0 1
2 2 179.0 1
3 3 180.5 1
4 4 201.0 2
5 5 201.5 2
6 6 245.0 2
7 7 246.4 2
8 8 189.3 1
9 9 301.0 3
10 10 354.0 3
11 11 369.0 3
12 12 205.0 2
13 13 199.0 1
14 14 394.0 3
15 15 231.3 2
>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With