I am trying to subset a data frame by using a variable name. I have it working but there is a part which I don't quite understand.
Originally I have this: rownames (mtcars[mtcars$hp >150,])
.
Then, rather than hard-coding "hp", I wanted to assign "hp" to a variable: foo <- "hp"
and subset with that. I got it working using this: rownames (mtcars[mtcars[foo] >150,])
. (Thanks to link which stopped me from playing with the $
operator.)
But, as I was building up this statement, I noticed there was a difference between the two. For mtcars$hp > 150
, I get this output:
[1] FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
[13] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[25] TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE
For mtcars[foo] > 150
, I get this:
hp
Mazda RX4 FALSE
Mazda RX4 Wag FALSE
Datsun 710 FALSE
Hornet 4 Drive FALSE
Hornet Sportabout TRUE
...
Are these two of the same "type"? Is there any reason why R displays the first one without rownames and the second one with rownames?
Perhaps I've naively thought that $
and []
were more or less equivalent. I can get the same final result, but I am curious and worried if my assumptions had been wrong. "Fortunately", I ignored this difference and carried on and got the same final result.
Thank you!
Below we will use the one-row data frame in order to provide briefer output:
mtcars1 <- mtcars[1, ]
Note the differences among these. We can use class
as in class(mtcars["hp"])
to investigate the class of the return value.
The first two correspond to the code in the question and return a data frame and plain vector respectively. The key differences between [
and $
are that [
(1) can specify multiple columns, (2) allows passing of a variable as the index and (3) returns a data frame (although see examples later on) whereas $
(1) can only specify a single column, (2) the index must be hard coded and (3) it returns a vector.
mtcars1["hp"] # returns data frame
## hp
## Mazda RX4 110
mtcars1$hp # returns plain vector
## [1] 110
Other examples where index is a single element. Note that the first and second examples below are actually the same as drop = TRUE
is the default.
mtcars1[, "hp"] # returns plain vector
## [1] 110
mtcars1[, "hp", drop = TRUE] # returns plain vector
## [1] 110
mtcars1[, "hp", drop = FALSE] # returns data frame
## hp
## Mazda RX4 110
Also there is the [[
operator which is like the $
operator except it can accept a variable as the index whereas $
requires the index to be hard coded:
mtcars1[["hp"]] # returns plain vector
## [1] 110
Others where index specifies multiple elements. $
and [[
cannot be used with multiple elements so these examples only use [
:
mtcars1[c("mpg", "hp")] # returns data frame
## mpg hp
## Mazda RX4 21 110
mtcars1[, c("mpg", "hp")] # returns data frame
## mpg hp
## Mazda RX4 21 110
mtcars1[, c("mpg", "hp"), drop = FALSE] # returns data frame
## mpg hp
## Mazda RX4 21 110
mtcars1[, c("mpg", "hp"), drop = TRUE] # returns list
## $mpg
## [1] 21
##
## $hp
## [1] 110
[
mtcars[foo]
can return more than one column if foo
is a vector with more than one element, e.g. mtcars[c("hp", "mpg")]
, and in all cases the return value is a data.frame even if foo
has only one element (as it does in the question).
There is also mtcars[, foo, drop = FALSE]
which returns the same value as mtcars[foo]
so it always returns a data frame. With drop = TRUE
it will return a list rather than a data.frame in the case that foo
specifies multiple columns and returns the column itself if it specifies a single column.
[[
On the other hand mtcars[[foo]]
only works if foo has one element and it returns that column, not a data frame.
$
mtcars$hp
also only works for a single column, like [[
, and returns the column, not a data frame containing that column.
mtcars$hp
is like mtcars[["hp"]]
; however, there is no possibility to pass a variable index with $
. One can only hard-code the index with $
.
subset
Note that this works:
subset(mtcars, hp > 150)
returning a data frame containing those rows where the hp
column exceeds 150
:
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
other objects
The above pertain to data frames but other objects that can use $
, [
and [[
will have their own rules. In particular if m
is a matrix, e.g. m <- as.matrix(BOD)
, then m[, 1]
is a vector, not a one column matrix, but m[, 1, drop = FALSE]
is a one column matrix. m[[1]]
and m[1]
are both the first element of m
, not the first column. m$a
does not work at all.
help
See ?Extract
for more information. Also ?"$"
, ?"["
and ?"[["
all get to the same page, as well.
The main difference lies on the returned object :
[]
will return a dataframe.$
, you will have the vector of the elements of the dataframe. You can apply the class(x)
function to see it. Basically, in the previous example, mtcars['foo']
is a dataframe, but mtcars[['foo']]
is a vector of float
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With