I have a file that looks like so:
date A B
2014-01-01 2 3
2014-01-02 5 NA
2014-01-03 NA NA
2014-01-04 7 11
If I use newdata <- na.omit(data)
where data
is the above table loaded via R, then I get only two data points. I get that since it will filter all instances of NA. What I want to do is to filter for each A
and B
so that I get three data points for A
and only two for B
. Clearly, my main data set is much larger than that and the numbers are different but neither should not matter.
How can I achieve that?
The na. omit() function returns a list without any rows that contain na values. It will drop rows with na value / nan values. This is the fastest way to remove na rows in the R programming language.
DataFrame-dropna() function The dropna() function is used to remove missing values. Determine if rows or columns which contain missing values are removed. 0, or 'index' : Drop rows which contain missing values. 1, or 'columns' : Drop columns which contain missing value.
Use is.na()
on the relevant vector of data you wish to look for and index using the negated result. For exmaple:
R> data[!is.na(data$A), ]
date A B
1 2014-01-01 2 3
2 2014-01-02 5 NA
4 2014-01-04 7 11
R> data[!is.na(data$B), ]
date A B
1 2014-01-01 2 3
4 2014-01-04 7 11
is.na()
returns TRUE
for every element that is NA
and FALSE
otherwise. To index the rows of the data frame, we can use this logical vector, but we want its converse. Hence we use !
to imply the opposite (TRUE
becomes FALSE
and vice versa).
You can restrict which columns you return by adding an index for the columns after the ,
in [ , ]
, e.g.
R> data[!is.na(data$A), 1:2]
date A
1 2014-01-01 2
2 2014-01-02 5
4 2014-01-04 7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With