Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

refer to range of columns by name in R

I need help with something that might be fairly simple in R. I want to refer to a range of columns in a data frame (e.g., extracting a few select variables). However, I don't know their column numbers. Normally, if I wanted to extract columns 4-10 i would say mydata[,4:10].

However, given that I don't know the column numbers, I would want to refer to them by name. Is there an easy way to do this? in sas or spss it is fairly easy to refer to a range of variables by name. Alternatively, is there an easy way to figure out which column number corresponds to a variable name in R?

like image 879
tcarpenter Avatar asked Dec 12 '22 09:12

tcarpenter


2 Answers

Getting a range of columns can be done in several ways. subset(data.frame, select = name4:name10), works but is quite long. I used that before I got annoyed writing long commands for a simple thing. I made a function to tackle the naming columns / not remembering column numbers in large data frames:

coln <- function(X){
  y <- rbind(seq(1,ncol(X)))
  colnames(y) <- colnames(X)
rownames(y) <- "col.number"
  return(y)} 

Here is how it works:

df <- data.frame(a = 1:10, b =10:1, c = 1:10)
coln(df)
           a b c
col.number 1 2 3

Now you can call them with numbers and still look at names.

like image 200
Mikko Avatar answered Jan 17 '23 16:01

Mikko


A column number can be identified from a column name within a data frame as follows:

which(colnames(mydf)=="a")

where mydf is a data frame and a is the name of the column the column number is required for.

(Source)

This can be used to create a column range:

firstcol = which(colnames(x)=="a")
lastcol = which(colnames(x)=="b")

mydf[c(firstcol:lastcol)]
like image 38
Matt_J Avatar answered Jan 17 '23 16:01

Matt_J