I have a dataframe which has several columns. I want to run the factor() function on one of the columns, say name my_col. Initially I did it this way
df[,"my_col"]<-factor((df[,"my_col"]))
It gave the following error
Error: 'x' must be atomic for 'sort.list' Have you called 'sort' on a list?
On referring to a similar question on SO my problem was solved.
Now if instead of the first method I try the following code, it works perfectly without giving any error
df$"my_col"<-factor(df$"my_col")
Why's that? Is there a difference between accessing a column via df$vec_name and df[,vec_name]?
Update:
str(df)
Classes 'tbl_df', 'tbl' and 'data.frame': 160 obs. of 8 variables:
$ area : int 1 1 1 1 1 1 1 1 1 1 ...
$ temp : int 1 1 1 1 1 1 1 1 1 1 ...
$ size : int 1 1 1 1 1 1 1 1 1 1 ...
$ storage : int 1 1 1 1 1 2 2 2 2 2 ...
$ my_col : int 1 2 3 4 5 1 2 3 4 5 ...
$ texture : num 2.9 2.3 2.5 2.1 1.9 1.8 2.6 3 2.2 2 ...
$ flavor : num 3.2 2.5 2.8 2.9 2.8 3 3.1 3 3.2 2.8 ...
$ moistness: num 3 2.6 2.8 2.4 2.2 1.7 2.4 2.9 2.5 1.9 ...
Your data is a tbl_df. I don't have your data, but we can look at an example using mtcars.
library(dplyr)
tbl_df(mtcars)[, "mpg"]
# Source: local data frame [32 x 1]
#
# mpg
# (dbl)
# 1 21.0
# 2 21.0
# 3 22.8
# 4 21.4
# 5 18.7
# 6 18.1
# 7 14.3
# 8 24.4
# 9 22.8
# 10 19.2
# .. ...
It's still a data frame, whereas in base R it would have been dropped to an atomic vector. dplyr:::`[.tbl_df` does not drop single columns, as is done in [.data.frame from base R. This is why we can't run factor() on it.
factor(tbl_df(mtcars)[, "mpg"])
# Error in sort.list(y) : 'x' must be atomic for 'sort.list'
# Have you called 'sort' on a list?
So you'll need to use [[, as in df[["my_col"]], or just use $.
df[["my_col"]] <- factor(df[["my_col"]])
Note: When you use the $ operator you can do it without the quotes around the column name.
df$my_col <- factor(df$my_col)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With