We have a data frame from a CSV file. The data frame DF
has columns that contain observed values and a column (VaR2
) that contains the date at which a measurement has been taken. If the date was not recorded, the CSV file contains the value NA
, for missing data.
Var1 Var2 10 2010/01/01 20 NA 30 2010/03/01
We would like to use the subset command to define a new data frame new_DF
such that it only contains rows that have an NA'
value from the column (VaR2
). In the example given, only Row 2 will be contained in the new DF
.
The command
new_DF<-subset(DF,DF$Var2=="NA")
does not work, the resulting data frame has no row entries.
If in the original CSV file the Value NA
are exchanged with NULL
, the same command produces the desired result: new_DF<-subset(DF,DF$Var2=="NULL")
.
How can I get this method working, if for the character string the value NA
is provided in the original CSV file?
In R, the easiest way to find columns that contain missing values is by combining the power of the functions is.na() and colSums(). First, you check and count the number of NA's per column. Then, you use a function such as names() or colnames() to return the names of the columns with at least one missing value.
The following in-built functions in R collectively can be used to find the rows and column pairs with NA values in the data frame. The is.na() function returns a logical vector of True and False values to indicate which of the corresponding elements are NA or not.
To select NA values you should use function is.na() . Show activity on this post. If you want to filter based on NAs in multiple columns, please consider using function filter_at() in combinations with a valid function to select the columns to apply the filtering condition and the filtering condition itself.
Never use =='NA' to test for missing values. Use is.na()
instead. This should do it:
new_DF <- DF[rowSums(is.na(DF)) > 0,]
or in case you want to check a particular column, you can also use
new_DF <- DF[is.na(DF$Var),]
In case you have NA character values, first run
Df[Df=='NA'] <- NA
to replace them with missing values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With