Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distinguishing the levels of a factor variable in R

Tags:

r

dataset

Let's say my data set contains three columns: id (identification), case (character), and value(numeric). This is my dataset:

tdata <- data.frame(id=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), case=c("a","b","c","c","a","b","c","c","a","b","c","c","a","b","c","c"), value=c(1,34,56,23,546,34,67,23,65,23,65,23,87,34,321,56))

tdata
   id case value
1   1    a     1
2   1    b    34
3   1    c    56
4   1    c    23
5   2    a   546
6   2    b    34
7   2    c    67
8   2    c    23
9   3    a    65
10  3    b    23
11  3    c    65
12  3    c    23
13  4    a    87
14  4    b    34
15  4    c   321
16  4    c    56

If you notice, for each ID, we have two c's. How can I rename them c1 and c2? (I need to distinguish between them for further analysis).

like image 952
user9292 Avatar asked Nov 26 '14 16:11

user9292


People also ask

How do you find the level of a factor variable in R?

We can check if a variable is a factor or not using class() function. Similarly, levels of a factor can be checked using the levels() function.

What is a factor in R and what are levels of a factor?

Factors are the data objects which are used to categorize the data and store it as levels. They can store both strings and integers. They are useful in the columns which have a limited number of unique values. Like "Male, "Female" and True, False etc. They are useful in data analysis for statistical modeling.

How do you describe a factor in R?

Factors in R are stored as a vector of integer values with a corresponding set of character values to use when the factor is displayed. The factor function is used to create a factor. The only required argument to factor is a vector of values which will be returned as a vector of factor values.

How do you know if a variable is categorical or continuous in R?

In descriptive statistics for categorical variables in R, the value is limited and usually based on a particular finite group. For example, a categorical variable in R can be countries, year, gender, occupation. A continuous variable, however, can take any values, from integer to decimal.


2 Answers

How about:

within(tdata, case <- ave(as.character(case), id, FUN=make.unique))
like image 52
Matthew Plourde Avatar answered Sep 23 '22 08:09

Matthew Plourde


How about this slightly modified approach:

library(dplyr)

tdata %>% group_by(id, case) %>% mutate(caseNo = paste0(case, row_number())) %>% 
    ungroup() %>% select(-case)

#Source: local data frame [16 x 3]
#
#   id value caseNo
#1   1     1     a1
#2   1    34     b1
#3   1    56     c1
#4   1    23     c2
#5   2   546     a1
#6   2    34     b1
#7   2    67     c1
#8   2    23     c2
#9   3    65     a1
#10  3    23     b1
#11  3    65     c1
#12  3    23     c2
#13  4    87     a1
#14  4    34     b1
#15  4   321     c1
#16  4    56     c2
like image 40
talat Avatar answered Sep 24 '22 08:09

talat