Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Levels in R Dataframe

I imported data from a .csv file, and attached the dataset.
My problem: one variable is in integer form and has 295 levels. I need to use this variable to create others, but I don't know how to deal with the levels.

What are these, and how do I deal with them?

like image 516
Thomas Avatar asked Dec 01 '10 22:12

Thomas


2 Answers

When you read in the data with read.table (or read.csv? - you didn't specify), add the argument stringsAsFactors = FALSE. Then you will get character data instead.

If you are expecting integers for the column then you must have data that is not interpretable as integers, so convert to numeric after you've read it.

txt <- c("x,y,z", "1,2,3", "a,b,c")

d <- read.csv(textConnection(txt))
sapply(d, class)
       x        y        z 
##"factor" "factor" "factor" 

## we don't want factors, but characters
d <- read.csv(textConnection(txt), stringsAsFactors = FALSE)
sapply(d, class)

#          x           y           z 
#"character" "character" "character" 

## convert x to numeric, and wear NAs for non numeric data
as.numeric(d$x)

#[1]  1 NA
#Warning message:
#NAs introduced by coercion 

Finally, if you want to ignore these input details and extract the integer levels from the factor use e.g. as.numeric(levels(d$x))[d$x], as per "Warning" in ?factor.

like image 139
mdsumner Avatar answered Oct 03 '22 16:10

mdsumner


or you can simply use

d$x2 = as.numeric(as.character(d$x)).

like image 44
Arthur Avatar answered Oct 03 '22 17:10

Arthur