Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset of rows containing NA (missing) values in a chosen column of a data frame

We have a data frame from a CSV file. The data frame DF has columns that contain observed values and a column (VaR2) that contains the date at which a measurement has been taken. If the date was not recorded, the CSV file contains the value NA, for missing data.

Var1  Var2  10   2010/01/01 20   NA 30   2010/03/01 

We would like to use the subset command to define a new data frame new_DF such that it only contains rows that have an NA' value from the column (VaR2). In the example given, only Row 2 will be contained in the new DF.

The command

new_DF<-subset(DF,DF$Var2=="NA")  

does not work, the resulting data frame has no row entries.

If in the original CSV file the Value NA are exchanged with NULL, the same command produces the desired result: new_DF<-subset(DF,DF$Var2=="NULL").

How can I get this method working, if for the character string the value NA is provided in the original CSV file?

like image 364
John Avatar asked Nov 02 '11 12:11

John


People also ask

How do I find Na in a column in R?

In R, the easiest way to find columns that contain missing values is by combining the power of the functions is.na() and colSums(). First, you check and count the number of NA's per column. Then, you use a function such as names() or colnames() to return the names of the columns with at least one missing value.

How do I find Na rows in R?

The following in-built functions in R collectively can be used to find the rows and column pairs with NA values in the data frame. The is.na() function returns a logical vector of True and False values to indicate which of the corresponding elements are NA or not.

How do I select rows with NA values?

To select NA values you should use function is.na() . Show activity on this post. If you want to filter based on NAs in multiple columns, please consider using function filter_at() in combinations with a valid function to select the columns to apply the filtering condition and the filtering condition itself.


1 Answers

Never use =='NA' to test for missing values. Use is.na() instead. This should do it:

new_DF <- DF[rowSums(is.na(DF)) > 0,] 

or in case you want to check a particular column, you can also use

new_DF <- DF[is.na(DF$Var),] 

In case you have NA character values, first run

Df[Df=='NA'] <- NA 

to replace them with missing values.

like image 194
Joris Meys Avatar answered Sep 21 '22 16:09

Joris Meys