What is the quickest/best way to change a large number of columns to numeric from factor?
I used the following code but it appears to have re-ordered my data.
> head(stats[,1:2]) rk team 1 1 Washington Capitals* 2 2 San Jose Sharks* 3 3 Chicago Blackhawks* 4 4 Phoenix Coyotes* 5 5 New Jersey Devils* 6 6 Vancouver Canucks* for(i in c(1,3:ncol(stats))) { stats[,i] <- as.numeric(stats[,i]) } > head(stats[,1:2]) rk team 1 2 Washington Capitals* 2 13 San Jose Sharks* 3 24 Chicago Blackhawks* 4 26 Phoenix Coyotes* 5 27 New Jersey Devils* 6 28 Vancouver Canucks*
What is the best way, short of naming every column as in:
df$colname <- as.numeric(ds$colname)
Use the lapply() Function to Convert Multiple Columns From Integer to Numeric Type in R. Base R's lapply() function allows us to apply a function to elements of a list. We will apply the as. numeric() function.
We must first convert the factor vector to a character vector, then to a numeric vector. This ensures that the numeric vector contains the actual numeric values instead of the factor levels.
To convert columns of an R data frame from integer to numeric we can use lapply function. For example, if we have a data frame df that contains all integer columns then we can use the code lapply(df,as. numeric) to convert all of the columns data type into numeric data type.
Converting Numeric value to a Factor For converting a numeric into factor we use cut() function. cut() divides the range of numeric vector(assume x) which is to be converted by cutting into intervals and codes its value (x) according to which interval they fall.
You have to be careful while changing factors to numeric. Here is a line of code that would change a set of columns from factor to numeric. I am assuming here that the columns to be changed to numeric are 1, 3, 4 and 5 respectively. You could change it accordingly
cols = c(1, 3, 4, 5); df[,cols] = apply(df[,cols], 2, function(x) as.numeric(as.character(x)));
Further to Ramnath's answer, the behaviour you are experiencing is that due to as.numeric(x)
returning the internal, numeric representation of the factor x
at the R level. If you want to preserve the numbers that are the levels of the factor (rather than their internal representation), you need to convert to character via as.character()
first as per Ramnath's example.
Your for
loop is just as reasonable as an apply
call and might be slightly more readable as to what the intention of the code is. Just change this line:
stats[,i] <- as.numeric(stats[,i])
to read
stats[,i] <- as.numeric(as.character(stats[,i]))
This is FAQ 7.10 in the R FAQ.
HTH
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With