I am trying to categorize a numeric variable (age) into groups defined by intervals so it will not be continuous. I have this code: <pre class="prettyprint"><code>data$agegrp(data$age >= 40 & data$age <= 49) <- 3 data$agegrp(data$age >= 30 & data$age <= 39) <- 2 data$agegrp(data$age >= 20 & data$age <= 29) <- 1 </code></pre> the above code is not working under survival package. It's giving me: <pre class="prettyprint"><code>invalid function in complex assignment </code></pre> Can you point me where the error is? <code>data</code> is the dataframe I am using.

I would use <code>findInterval()</code> here: First, make up some sample data <pre class="prettyprint"><code>set.seed(1) ages <- floor(runif(20, min = 20, max = 50)) ages # [1] 27 31 37 47 26 46 48 39 38 21 26 25 40 31 43 34 41 49 31 43 </code></pre> Use <code>findInterval()</code> to categorize your "ages" vector. <pre class="prettyprint"><code>findInterval(ages, c(20, 30, 40)) # [1] 1 2 2 3 1 3 3 2 2 1 1 1 3 2 3 2 3 3 2 3 </code></pre> Alternatively, as recommended in the comments, <code>cut()</code> is also useful here: <pre class="prettyprint"><code>cut(ages, breaks=c(20, 30, 40, 50), right = FALSE) cut(ages, breaks=c(20, 30, 40, 50), right = FALSE, labels = FALSE) </code></pre>

We can use <code>dplyr</code>: <pre class="prettyprint"><code>library(dplyr) data <- data %>% mutate(agegroup = case_when(age >= 40 & age <= 49 ~ '3', age >= 30 & age <= 39 ~ '2', age >= 20 & age <= 29 ~ '1')) # end function </code></pre> Compared to other approaches, <code>dplyr</code> is easier to write and interpret.

Categorize numeric variable into group/ bins/ breaks

data$agegrp(data$age >= 40 & data$age <= 49) <- 3 data$agegrp(data$age >= 30 & data$age <= 39) <- 2 data$agegrp(data$age >= 20 & data$age <= 29) <- 1

the above code is not working under survival package. It's giving me:

invalid function in complex assignment

Can you point me where the error is? data is the dataframe I am using.

212

asked Oct 19 '12 17:10

leian

2 Answers

I would use findInterval() here:

First, make up some sample data

set.seed(1) ages <- floor(runif(20, min = 20, max = 50)) ages # [1] 27 31 37 47 26 46 48 39 38 21 26 25 40 31 43 34 41 49 31 43

Use findInterval() to categorize your "ages" vector.

findInterval(ages, c(20, 30, 40)) # [1] 1 2 2 3 1 3 3 2 2 1 1 1 3 2 3 2 3 3 2 3

Alternatively, as recommended in the comments, cut() is also useful here:

cut(ages, breaks=c(20, 30, 40, 50), right = FALSE) cut(ages, breaks=c(20, 30, 40, 50), right = FALSE, labels = FALSE)

answered Sep 25 '22 14:09

A5C1D2H2I1M1N2O1R2T1

We can use dplyr:

library(dplyr)  data <- data %>% mutate(agegroup = case_when(age >= 40  & age <= 49 ~ '3',                                              age >= 30  & age <= 39 ~ '2',                                              age >= 20  & age <= 29 ~ '1')) # end function

Compared to other approaches, dplyr is easier to write and interpret.

answered Sep 22 '22 14:09

TYL

Related questions
                            
                                How to access to specify file in subfolder without change working directory In R?
                            
                                Install binary zipped R package via command line
                            
                                Check whether two vectors contain the same (unordered) elements in R
                            
                                How to remove duplicated column names in R?
                            
                                Transpose / reshape dataframe without "timevar" from long to wide format
                            
                                Add (subtract) months without exceeding the last day of the new month
                            
                                Should I avoid programming packages with pipe operators?
                            
                                Count unique values for every column
                            
                                Replacing occurrences of a number in multiple columns of data frame with another value in R
                            
                                Easy way of counting precision, recall and F1-score in R
                            
                                How to plot dendrograms with large datasets?
                            
                                Calculating cumulative sum for each row
                            
                                Creating arbitrary panes in ggplot2
                            
                                Find how many times duplicated rows repeat in R data frame [duplicate]
                            
                                R: Calculate and interpret odds ratio in logistic regression
                            
                                Is it possible to insert (add) a row to a SQLite db table using dplyr package?
                            
                                Reproduce table and plot from journal
                            
                                How to create a raster from a data frame in r?
                            
                                How do I preserve transparency in ggplot2?
                            
                                r random forest error - type of predictors in new data do not match

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Categorize numeric variable into group/ bins/ breaks

Tags:

r

binning

categorization

bins

leian

People also ask

2 Answers

A5C1D2H2I1M1N2O1R2T1

TYL

Recent Activity

Donate For Us