We have a data frame from a CSV file. The data frame <code>DF</code> has columns that contain observed values and a column (<code>VaR2</code>) that contains the date at which a measurement has been taken. If the date was not recorded, the CSV file contains the value <code>NA</code>, for missing data. <pre class="prettyprint"><code>Var1 Var2 10 2010/01/01 20 NA 30 2010/03/01 </code></pre> We would like to use the subset command to define a new data frame <code>new_DF</code> such that it only contains rows that have an <code>NA'</code> value from the column (<code>VaR2</code>). In the example given, only Row 2 will be contained in the new <code>DF</code>. The command <pre class="prettyprint"><code>new_DF<-subset(DF,DF$Var2=="NA") </code></pre> does not work, the resulting data frame has no row entries. If in the original CSV file the Value <code>NA</code> are exchanged with <code>NULL</code>, the same command produces the desired result: <code>new_DF<-subset(DF,DF$Var2=="NULL")</code>. How can I get this method working, if for the character string the value <code>NA</code> is provided in the original CSV file?

Never use =='NA' to test for missing values. Use <code>is.na()</code> instead. This should do it: <pre class="prettyprint"><code>new_DF <- DF[rowSums(is.na(DF)) > 0,] </code></pre> or in case you want to check a particular column, you can also use <pre class="prettyprint"><code>new_DF <- DF[is.na(DF$Var),] </code></pre> In case you have NA character values, first run <pre class="prettyprint"><code>Df[Df=='NA'] <- NA </code></pre> to replace them with missing values.

Subset of rows containing NA (missing) values in a chosen column of a data frame

Q: How do I find Na in a column in R?

In R, the easiest way to find columns that contain missing values is by combining the power of the functions is.na() and colSums(). First, you check and count the number of NA's per column. Then, you use a function such as names() or colnames() to return the names of the columns with at least one missing value.

Q: How do I find Na rows in R?

The following in-built functions in R collectively can be used to find the rows and column pairs with NA values in the data frame. The is.na() function returns a logical vector of True and False values to indicate which of the corresponding elements are NA or not.

Q: How do I select rows with NA values?

To select NA values you should use function is.na() . Show activity on this post. If you want to filter based on NAs in multiple columns, please consider using function filter_at() in combinations with a valid function to select the columns to apply the filtering condition and the filtering condition itself.

Tags:

dataframe

r

csv

na

subset

We have a data frame from a CSV file. The data frame DF has columns that contain observed values and a column (VaR2) that contains the date at which a measurement has been taken. If the date was not recorded, the CSV file contains the value NA, for missing data.

Var1  Var2  10   2010/01/01 20   NA 30   2010/03/01

We would like to use the subset command to define a new data frame new_DF such that it only contains rows that have an NA' value from the column (VaR2). In the example given, only Row 2 will be contained in the new DF.

The command

new_DF<-subset(DF,DF$Var2=="NA")

does not work, the resulting data frame has no row entries.

If in the original CSV file the Value NA are exchanged with NULL, the same command produces the desired result: new_DF<-subset(DF,DF$Var2=="NULL").

How can I get this method working, if for the character string the value NA is provided in the original CSV file?

364

asked Nov 02 '11 12:11

John

1 Answers

Never use =='NA' to test for missing values. Use is.na() instead. This should do it:

new_DF <- DF[rowSums(is.na(DF)) > 0,]

or in case you want to check a particular column, you can also use

new_DF <- DF[is.na(DF$Var),]

In case you have NA character values, first run

Df[Df=='NA'] <- NA

to replace them with missing values.

194

answered Sep 21 '22 16:09

Joris Meys

Related questions
                            
                                Repeat rows of a data.frame [duplicate]
                            
                                group by two columns in ggplot2
                            
                                How to learn R as a programming language [closed]
                            
                                What are the "standard unambiguous date" formats for string-to-date conversion in R?
                            
                                Error: could not find function "%>%"
                            
                                Difference between as.POSIXct/as.POSIXlt and strptime for converting character vectors to POSIXct/POSIXlt
                            
                                How to add table of contents in Rmarkdown?
                            
                                Programmatically creating Markdown tables in R with KnitR
                            
                                How do I arrange a variable list of plots using grid.arrange?
                            
                                Error: gdal-config not found while installing R dependent packages whereas gdal is installed
                            
                                Easy way to export multiple data.frame to multiple Excel worksheets
                            
                                Specify custom Date format for colClasses argument in read.table/read.csv
                            
                                Sort columns of a dataframe by column name
                            
                                R: Count number of objects in list [closed]
                            
                                switch() statement usage
                            
                                Converting string to numeric [duplicate]
                            
                                R Conditional evaluation when using the pipe operator %>%
                            
                                How can I load an object into a variable name that I specify from an R data file?
                            
                                Getting the top values by group
                            
                                Remove extra legends in ggplot2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With