Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looping over a specific subset of column names in R

I have a dataframe AData that of which I have extracted a certain subset of its column names say SpecialNames. I would like to know how to reference these columns in a for loop.

My current code looks like this:

SpecialNames <- setdiff(colnames(AData), colnames(BData))

for ( i in SpecialNames ) {

    AData$i <- NULL # Do something to AData$i such as delete it or something else

}

Alas, AData$i does not seem to reference the column of dataframe AData with name i. Is there a different syntax that would give me that?

I read in this post here that: "the $ is for interactive usage. Instead, when programming, i.e. when the column name is to be interpreted, you need to use [ or [[, hence I replaced sample$i.imp with sample[[paste0(i, '.impt')]]".

Based on this comment, I guessed that perhaps the syntax I have been looking for is AData$[i] or AData$[[i]] or AData$[[paste0(i)]] but none of these seem to work either.

Any ideas?

like image 212
Henrik Nordmark Avatar asked Nov 07 '13 15:11

Henrik Nordmark


2 Answers

Not knowing what you are doing, it's hard to say whether a for loop is the way to go or not; however, hopefully this will help get you on your way:

## Sample data is always nice
set.seed(1)
mydf <- data.frame(A = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4),
                   B = LETTERS[c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2)],
                   matrix(sample(100, 36, replace = TRUE), nrow = 12))

## Here is your vector of special names
specialnames <- setdiff(names(mydf), c("A", "B"))

## Here is a `for` loop that will print the first two rows
##   of each column named in "specialnames"
## THIS IS NOT HOW I WOULD NORMALLY DO THIS IN R
## -------------------------------------------------------
for (i in seq_along(specialnames)) {
  print(head(mydf[specialnames[i]], 2))
}

Matters of note (perhaps):

  • for (i in seq_along(specialnames)): That seq_along (or i in 1:length(specialnames) or something like that) is important.
  • You seem to have misunderstood the use of [ and [[. Try the following to get a sense of what they do:
    • mydf["A"]
    • mydf[["A"]]
    • mydf[1, c("A", "B")]
  • Two questions to look at here and here.
like image 66
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 22 '22 02:09

A5C1D2H2I1M1N2O1R2T1


You're very close in your loop -- there's just a subtle feature in the use and meaning of [ and [[ that you're missing. See this note on subsetting by Hadley Wickham for some details.

To get the performance you'd like (assigning NULL to remove a column), you must use [[. Using mydf[, specialnames] <- NULL will throw an error.

I agree this is somewhat confusing, as mydf[, specialnames] <- NA will work: i think it's about the former changing the structure of the data.frame and the latter not doing so...

Thus your function becomes:

for (name in specialnames) { 
   mydf[[name]] <- NULL
}

So setting things up we have:

set.seed(1)
mydf <- data.frame(A = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4),
               B = LETTERS[c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2)],
               matrix(sample(100, 36, replace = TRUE), nrow = 12))

## Here is your vector of special names
specialnames <- setdiff(names(mydf), c("A", "B"))

and after the loop we would obtain:

R> mydf
   A B
1  1 A
2  1 A
3  1 A
4  2 A
5  2 A
6  3 B
7  3 B
8  3 B
9  3 B
10 4 B
11 4 B
12 4 B
like image 25
ricardo Avatar answered Sep 22 '22 02:09

ricardo