I have a dataframe AData that of which I have extracted a certain subset of its column names say SpecialNames. I would like to know how to reference these columns in a for loop.
My current code looks like this:
SpecialNames <- setdiff(colnames(AData), colnames(BData))
for ( i in SpecialNames ) {
AData$i <- NULL # Do something to AData$i such as delete it or something else
}
Alas, AData$i does not seem to reference the column of dataframe AData with name i. Is there a different syntax that would give me that?
I read in this post here that: "the $ is for interactive usage. Instead, when programming, i.e. when the column name is to be interpreted, you need to use [ or [[, hence I replaced sample$i.imp with sample[[paste0(i, '.impt')]]".
Based on this comment, I guessed that perhaps the syntax I have been looking for is AData$[i] or AData$[[i]] or AData$[[paste0(i)]] but none of these seem to work either.
Any ideas?
Not knowing what you are doing, it's hard to say whether a for
loop is the way to go or not; however, hopefully this will help get you on your way:
## Sample data is always nice
set.seed(1)
mydf <- data.frame(A = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4),
B = LETTERS[c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2)],
matrix(sample(100, 36, replace = TRUE), nrow = 12))
## Here is your vector of special names
specialnames <- setdiff(names(mydf), c("A", "B"))
## Here is a `for` loop that will print the first two rows
## of each column named in "specialnames"
## THIS IS NOT HOW I WOULD NORMALLY DO THIS IN R
## -------------------------------------------------------
for (i in seq_along(specialnames)) {
print(head(mydf[specialnames[i]], 2))
}
Matters of note (perhaps):
for (i in seq_along(specialnames))
: That seq_along
(or i in 1:length(specialnames)
or something like that) is important.[
and [[
. Try the following to get a sense of what they do:
mydf["A"]
mydf[["A"]]
mydf[1, c("A", "B")]
You're very close in your loop -- there's just a subtle feature in the use and meaning of [
and [[
that you're missing. See this note on subsetting by Hadley Wickham for some details.
To get the performance you'd like (assigning NULL
to remove a column), you must use [[
. Using mydf[, specialnames] <- NULL
will throw an error.
I agree this is somewhat confusing, as mydf[, specialnames] <- NA
will work: i think it's about the former changing the structure of the data.frame
and the latter not doing so...
Thus your function becomes:
for (name in specialnames) {
mydf[[name]] <- NULL
}
So setting things up we have:
set.seed(1)
mydf <- data.frame(A = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4),
B = LETTERS[c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2)],
matrix(sample(100, 36, replace = TRUE), nrow = 12))
## Here is your vector of special names
specialnames <- setdiff(names(mydf), c("A", "B"))
and after the loop we would obtain:
R> mydf
A B
1 1 A
2 1 A
3 1 A
4 2 A
5 2 A
6 3 B
7 3 B
8 3 B
9 3 B
10 4 B
11 4 B
12 4 B
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With