I'm learning to use R (version 3.1.2), so this may come as a noob question, but I'm having problems ordering a subset of a data frame. If I use the mtcars data frame using attach(mtcars)
, I can easily order it using ord.cars <- mtcars[order(hp),]
. The problem is, if I use a subset, let's say sub.cars <- subset(mtcars, hp > 120)
and try to order it using ord.sub <- sub.cars[order(mpg),]
, the result is the following:
mpg cyl disp hp drat wt qsec vs am gear carb
NA NA NA NA NA NA NA NA NA NA NA NA
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
NA.1 NA NA NA NA NA NA NA NA NA NA NA
NA.2 NA NA NA NA NA NA NA NA NA NA NA
NA.3 NA NA NA NA NA NA NA NA NA NA NA
NA.4 NA NA NA NA NA NA NA NA NA NA NA
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
NA.5 NA NA NA NA NA NA NA NA NA NA NA
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
NA.6 NA NA NA NA NA NA NA NA NA NA NA
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
NA.7 NA NA NA NA NA NA NA NA NA NA NA
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
NA.8 NA NA NA NA NA NA NA NA NA NA NA
NA.9 NA NA NA NA NA NA NA NA NA NA NA
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
NA.10 NA NA NA NA NA NA NA NA NA NA NA
NA.11 NA NA NA NA NA NA NA NA NA NA NA
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
NA.12 NA NA NA NA NA NA NA NA NA NA NA
NA.13 NA NA NA NA NA NA NA NA NA NA NA
NA.14 NA NA NA NA NA NA NA NA NA NA NA
Why is R putting back as NAs all the rows that were left out of the subset?
Thanks in advance!
To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.
By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.
There is a function in R that you can use (called the sort function) to sort your data in either ascending or descending order. The variable by which sort you can be a numeric, string or factor variable. You also have some options on how missing values will be handled: they can be listed first, last or removed.
The subset function is available in base R and can be used to return subsets of a vector, martix, or data frame which meet a particular condition. In my three years of using R, I have repeatedly used the subset() function and believe that it is the most useful tool for selecting elements of a data structure.
This is a problem related to your use of attach()
which is not recommended in R - for exactly this reason! The problem is, that your code is kind of ambiguous, or at least, it is something different than what you expected it to be.
How to resolve this?
detach
the data set andattach
again. Instead, use [
and/or $
and if you like with()
to subset your data. Here's how you could do it for the example:
detach(mtcars)
ord.cars <- mtcars[order(mtcars$hp),]
sub.cars <- subset(mtcars, hp > 120)
#the subset could also be written as:
sub.cars <- mtcars[mtcars$hp > 120,]
ord.sub <- sub.cars[order(sub.cars$mpg),]
head(ord.sub) # only show the first 6 rows
mpg cyl disp hp drat wt qsec vs am gear carb
Cadillac Fleetwood 10.4 8 472 205 2.93 5.25 18.0 0 0 3 4
Lincoln Continental 10.4 8 460 215 3.00 5.42 17.8 0 0 3 4
Camaro Z28 13.3 8 350 245 3.73 3.84 15.4 0 0 3 4
Duster 360 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
Chrysler Imperial 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4
Maserati Bora 15.0 8 301 335 3.54 3.57 14.6 0 1 5 8
What exactly caused the problem in your code?
After you attached
the mtcars data, whenever you call one of the column names of the attached data, like mpg
, it will refer to the attached data set (the original mtcats data). The problem then was that you subsetted the data and stored it in a new object (sub.cars) which was not attached while mtcars was still attached. Then, when you tried to order the sub.cars data, you used sub.cars[order(mpg),]
and as you can see, in there, you refer to mpg
column - that is interpreted by R as the one from the attached (original) mtcars data set, with more rows than you subsetted data. All those rows in your sub.cars which were excluded by the subsetting, will now be displayed as NAs in sub.cars
.
Lesson: don't use attach()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With