I have a large data set and I would like to read specific columns or drop all the others.
data <- read.dta("file.dta")
I select the columns that I'm not interested in:
var.out <- names(data)[!names(data) %in% c("iden", "name", "x_serv", "m_serv")]
and than I'd like to do something like:
for(i in 1:length(var.out)) { paste("data$", var.out[i], sep="") <- NULL }
to drop all the unwanted columns. Is this the optimal solution?
You should use either indexing or the subset
function. For example :
R> df <- data.frame(x=1:5, y=2:6, z=3:7, u=4:8) R> df x y z u 1 1 2 3 4 2 2 3 4 5 3 3 4 5 6 4 4 5 6 7 5 5 6 7 8
Then you can use the which
function and the -
operator in column indexation :
R> df[ , -which(names(df) %in% c("z","u"))] x y 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6
Or, much simpler, use the select
argument of the subset
function : you can then use the -
operator directly on a vector of column names, and you can even omit the quotes around the names !
R> subset(df, select=-c(z,u)) x y 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6
Note that you can also select the columns you want instead of dropping the others :
R> df[ , c("x","y")] x y 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 R> subset(df, select=c(x,y)) x y 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With