I'm python user learning R.
Frequently, I need to check if columns of a dataframe contain NaN(s).
In python, I can simply do
import pandas as pd
df = pd.DataFrame({'colA': [1, 2, None, 3],
'colB': ['A', 'B', 'C', 'D']})
df.isna().any()
giving me
colA True
colB False
dtype: bool
In R I'm struggling to find an easy solution. People refer to some apply-like methods but that seems overly complex for such a primitive task. The closest solution I've found is this:
library(tidyverse)
df = data.frame(colA = c(1, 2, NA, 3), colB = c('A', 'B', 'C', 'D'))
!complete.cases(t(df))
giving
[1] TRUE FALSE
That's OKyish but I don't see the column names. If the dataframe has 50 columns I don't know which one has NaNs.
Is there a better R solution?
To check NaN values in R, use the is. nan() function. The is. nan() is a built-in R function that tests the object's value and returns TRUE if it finds the NaN value; otherwise, it returns FALSE.
%in% operator can be used in R Programming Language, to check for the presence of an element inside a vector. It returns a boolean output, evaluating to TRUE if the element is present, else returns false.
In R, the easiest way to find columns that contain missing values is by combining the power of the functions is.na() and colSums(). First, you check and count the number of NA's per column. Then, you use a function such as names() or colnames() to return the names of the columns with at least one missing value.
@teelou sum (is.na (my.df [,'b'])) == nrow (my.df) it will give you whether all the rows are having Na or NaN. @Prradep for a single column all (is.na (...)) is more readable. However, you could find all columns that are completely NA with:
However, if we try to run an invalid computation (e.g. 0 / 0), R returns NaN: If we have a complex vector, data frame or matrix, it might be complicated to identify the NaN values in our data. In such a case, we can apply the is.nan function. The is.nan function returns a logical vector or matrix, which indicates the NaN positions in our data.
Therefore, it is key to identify NA’s as soon as possible. In R, the easiest way to find columns that contain missing values is by combining the power of the functions is.na() and colSums(). First, you check and count the number of NA’s per column.
The variable x1 contains 2 NAs. We can also count the NA values of multiple data frame columns by using the colSums function instead of the sum function. Have a look at the following R code: The RStudio console output shows the number of NA values for each of our variables.
You can use anyNA: Checks for NA in a vector
df = data.frame(colA = c(1, 2, NA, 3), colB = c('A', 'B', 'C', 'D'))
sapply(df, anyNA)
colA colB
TRUE FALSE
jay.sf is right. This will check for NaNs.
df = data.frame(colA = c(1, 2, NA, 3), colB = c('A', 'B', 'C', 'D'))
anyNAN <- function(x) {
any(is.nan(x))
}
sapply(df, anyNAN)
The best waty to check if columns have NAs is to apply a loop to the columns with a function to check whether there is any(is.na)
.
lapply(df, function(x) any(is.na(x)))
$colA
[1] TRUE
$colB
[1] FALSE
I can see you load the tidyverse yet did not use it in your example. If we want to do this within the tidyverse, we can use purrr:
library(purrr)
df %>% map(~any(is.na(.x)))
Or with dplyr:
library(dplyr)
df %>% summarise(across(everything(), ~any(is.na(.x))))
colA colB
1 TRUE FALSE
The easiest way would be:
df = data.frame(colA = c(1, 2, NA, 3), colB = c('A', 'B', 'C', 'D'))
is.na(df)
Output:
colA colB
[1,] FALSE FALSE
[2,] FALSE FALSE
[3,] TRUE FALSE
[4,] FALSE FALSE
Update, if you only want to see the rows containing NA:
> df[rowSums(is.na(df)) > 0,]
colA colB
3 NA C
Update2, or to get only ColNames with information about NA (thanks to RSale for anyNA
):
> lapply(df, anyNA)
$colA
[1] TRUE
$colB
[1] FALSE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With