Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R sorting data subset

I'm learning to use R (version 3.1.2), so this may come as a noob question, but I'm having problems ordering a subset of a data frame. If I use the mtcars data frame using attach(mtcars), I can easily order it using ord.cars <- mtcars[order(hp),]. The problem is, if I use a subset, let's say sub.cars <- subset(mtcars, hp > 120) and try to order it using ord.sub <- sub.cars[order(mpg),], the result is the following:

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
NA                    NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
NA.1                  NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
NA.2                  NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
NA.3                  NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
NA.4                  NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
NA.5                  NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
NA.6                  NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
NA.7                  NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
NA.8                  NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
NA.9                  NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
NA.10                 NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
NA.11                 NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
NA.12                 NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
NA.13                 NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
NA.14                 NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA

Why is R putting back as NAs all the rows that were left out of the subset?

Thanks in advance!

like image 484
Tommy Avatar asked Nov 22 '14 15:11

Tommy


People also ask

How do I sort a subset of data in R?

To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.

How do I select a subset of a row in R?

By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.

Can you sort data in R?

There is a function in R that you can use (called the sort function) to sort your data in either ascending or descending order. The variable by which sort you can be a numeric, string or factor variable. You also have some options on how missing values will be handled: they can be listed first, last or removed.

Is [] subset a function in R?

The subset function is available in base R and can be used to return subsets of a vector, martix, or data frame which meet a particular condition. In my three years of using R, I have repeatedly used the subset() function and believe that it is the most useful tool for selecting elements of a data structure.


1 Answers

This is a problem related to your use of attach() which is not recommended in R - for exactly this reason! The problem is, that your code is kind of ambiguous, or at least, it is something different than what you expected it to be.

How to resolve this?

  1. detach the data set and
  2. don't use attach again. Instead, use [ and/or $ and if you like with() to subset your data.

Here's how you could do it for the example:

detach(mtcars)
ord.cars <- mtcars[order(mtcars$hp),]

sub.cars <- subset(mtcars, hp > 120)
#the subset could also be written as:
sub.cars <- mtcars[mtcars$hp > 120,]

ord.sub <- sub.cars[order(sub.cars$mpg),]

head(ord.sub)  # only show the first 6 rows
                     mpg cyl disp  hp drat   wt qsec vs am gear carb
Cadillac Fleetwood  10.4   8  472 205 2.93 5.25 18.0  0  0    3    4
Lincoln Continental 10.4   8  460 215 3.00 5.42 17.8  0  0    3    4
Camaro Z28          13.3   8  350 245 3.73 3.84 15.4  0  0    3    4
Duster 360          14.3   8  360 245 3.21 3.57 15.8  0  0    3    4
Chrysler Imperial   14.7   8  440 230 3.23 5.34 17.4  0  0    3    4
Maserati Bora       15.0   8  301 335 3.54 3.57 14.6  0  1    5    8

What exactly caused the problem in your code?

After you attached the mtcars data, whenever you call one of the column names of the attached data, like mpg, it will refer to the attached data set (the original mtcats data). The problem then was that you subsetted the data and stored it in a new object (sub.cars) which was not attached while mtcars was still attached. Then, when you tried to order the sub.cars data, you used sub.cars[order(mpg),] and as you can see, in there, you refer to mpg column - that is interpreted by R as the one from the attached (original) mtcars data set, with more rows than you subsetted data. All those rows in your sub.cars which were excluded by the subsetting, will now be displayed as NAs in sub.cars.

Lesson: don't use attach().

like image 185
talat Avatar answered Oct 04 '22 20:10

talat