Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subsetting R data frame results in mysterious NA rows

Tags:

r

na

reshape

subset

I've been encountering what I think is a bug. It's not a big deal, but I'm curious if anyone else has seen this. Unfortunately, my data is confidential, so I have to make up an example, and it's not going to be very helpful.

When subsetting my data, I occassionally get mysterious NA rows that aren't in my original data frame. Even the rownames are NA. EG:

example <- data.frame("var1"=c("A", "B", "A"), "var2"=c("X", "Y", "Z")) example    var1 var2 1    A    X 2    B    Y 3    A    Z 

then I run:

example[example$var1=="A",]    var1 var2 1    A    X 3    A    Z NA<NA> <NA> 

Of course, the example above does not actually give you this mysterious NA row; I am adding it here to illustrate the problem I'm having with my data.

Maybe it has to do with the fact that I'm importing my original data set using Google's read.xlsx package and then executing wide to long reshape before subsetting.

Thanks

like image 200
chrisg Avatar asked Jan 10 '13 15:01

chrisg


People also ask

What does subsetting do in R?

Subsetting in R Tutorial Subsetting in R is a useful indexing feature for accessing object elements. It can be used to select and filter variables and observations. You can use brackets to select rows and columns from your dataframe.

How do I remove all observations with NA in R?

To remove all rows having NA, we can use na. omit function. For Example, if we have a data frame called df that contains some NA values then we can remove all rows that contains at least one NA by using the command na.

What are the three subsetting operators in R?

There are three subsetting operators, [[ , [ , and $ . Subsetting operators interact differently with different vector types (e.g., atomic vectors, lists, factors, matrices, and data frames). Subsetting can be combined with assignment.

How do you subset data frames in R?

Subset a Data Frame with Base R Extract[] To specify a logical expression for the rows parameter, use the standard R operators. If subsetting is done by only rows or only columns, then leave the other value blank. For example, to subset the d data frame only by rows, the general form reduces to d[rows,] .


2 Answers

Wrap the condition in which:

df[which(df$number1 < df$number2), ] 

How it works:

It returns the row numbers where the condition matches (where the condition is TRUE) and subsets the data frame on those rows accordingly.

Say that:

which(df$number1 < df$number2) 

returns row numbers 1, 2, 3, 4 and 5.

As such, writing:

df[which(df$number1 < df$number2), ] 

is the same as writing:

df[c(1, 2, 3, 4, 5), ] 

Or an even simpler version is:

df[1:5, ] 
like image 153
c-urchin Avatar answered Oct 07 '22 08:10

c-urchin


I see this was already answered by the OP, but since his comment is buried deep within the comment section, here's my attempt to fix this issue (at least with my data, which was behaving the same way).

First of all, some sample data:

> df <- data.frame(name = LETTERS[1:10], number1 = 1:10, number2 = c(10:3, NA, NA)) > df    name number1 number2 1     A       1      10 2     B       2       9 3     C       3       8 4     D       4       7 5     E       5       6 6     F       6       5 7     G       7       4 8     H       8       3 9     I       9      NA 10    J      10      NA 

Now for a simple filter:

> df[df$number1 < df$number2, ]      name number1 number2 1       A       1      10 2       B       2       9 3       C       3       8 4       D       4       7 5       E       5       6 NA   <NA>      NA      NA NA.1 <NA>      NA      NA 

The problem here is that the presence of NAs in the third column causes R to rewrite the whole row as NA. Nonetheless, the data frame dimensions are maintained. Here's my fix, which requires knowledge of which column contains the NAs:

> df[df$number1 < df$number2 & !is.na(df$number2), ]   name number1 number2 1    A       1      10 2    B       2       9 3    C       3       8 4    D       4       7 5    E       5       6 
like image 43
Waldir Leoncio Avatar answered Oct 07 '22 10:10

Waldir Leoncio