Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The simplest way to check for NaNs in columns (R)?

Tags:

python

r

I'm python user learning R.

Frequently, I need to check if columns of a dataframe contain NaN(s).

In python, I can simply do

import pandas as pd
df = pd.DataFrame({'colA': [1,   2,   None, 3], 
                   'colB': ['A', 'B', 'C', 'D']})
df.isna().any()

giving me

colA   True
colB   False
dtype: bool

In R I'm struggling to find an easy solution. People refer to some apply-like methods but that seems overly complex for such a primitive task. The closest solution I've found is this:

library(tidyverse)
df = data.frame(colA = c(1, 2, NA, 3), colB = c('A', 'B', 'C', 'D'))
!complete.cases(t(df))

giving

[1] TRUE   FALSE

That's OKyish but I don't see the column names. If the dataframe has 50 columns I don't know which one has NaNs.

Is there a better R solution?

like image 678
user2743931 Avatar asked Dec 31 '21 11:12

user2743931


People also ask

How do I check if a value is NaN in R?

To check NaN values in R, use the is. nan() function. The is. nan() is a built-in R function that tests the object's value and returns TRUE if it finds the NaN value; otherwise, it returns FALSE.

How do you check if an element is in a column in R?

%in% operator can be used in R Programming Language, to check for the presence of an element inside a vector. It returns a boolean output, evaluating to TRUE if the element is present, else returns false.

How do I find NAS in a column in R?

In R, the easiest way to find columns that contain missing values is by combining the power of the functions is.na() and colSums(). First, you check and count the number of NA's per column. Then, you use a function such as names() or colnames() to return the names of the columns with at least one missing value.

How to find out if a column has na or Nan?

@teelou sum (is.na (my.df [,'b'])) == nrow (my.df) it will give you whether all the rows are having Na or NaN. @Prradep for a single column all (is.na (...)) is more readable. However, you could find all columns that are completely NA with:

How to identify the NaN values in R data?

However, if we try to run an invalid computation (e.g. 0 / 0), R returns NaN: If we have a complex vector, data frame or matrix, it might be complicated to identify the NaN values in our data. In such a case, we can apply the is.nan function. The is.nan function returns a logical vector or matrix, which indicates the NaN positions in our data.

How to identify Na’s in R?

Therefore, it is key to identify NA’s as soon as possible. In R, the easiest way to find columns that contain missing values is by combining the power of the functions is.na() and colSums(). First, you check and count the number of NA’s per column.

How to count NA values of multiple data frame columns in R?

The variable x1 contains 2 NAs. We can also count the NA values of multiple data frame columns by using the colSums function instead of the sum function. Have a look at the following R code: The RStudio console output shows the number of NA values for each of our variables.


Video Answer


3 Answers

You can use anyNA: Checks for NA in a vector

df = data.frame(colA = c(1, 2, NA, 3), colB = c('A', 'B', 'C', 'D'))
sapply(df, anyNA)

colA  colB 
TRUE FALSE 

Edit

jay.sf is right. This will check for NaNs.

df = data.frame(colA = c(1, 2, NA, 3), colB = c('A', 'B', 'C', 'D'))

anyNAN <- function(x) {
  any(is.nan(x))
}

sapply(df, anyNAN)
like image 156
RSale Avatar answered Oct 29 '22 02:10

RSale


The best waty to check if columns have NAs is to apply a loop to the columns with a function to check whether there is any(is.na).

lapply(df, function(x) any(is.na(x)))

$colA
[1] TRUE

$colB
[1] FALSE

I can see you load the tidyverse yet did not use it in your example. If we want to do this within the tidyverse, we can use purrr:

library(purrr)

df %>% map(~any(is.na(.x)))

Or with dplyr:

library(dplyr)

df %>% summarise(across(everything(), ~any(is.na(.x))))

  colA  colB
1 TRUE FALSE
like image 25
GuedesBF Avatar answered Oct 29 '22 04:10

GuedesBF


The easiest way would be:

df = data.frame(colA = c(1, 2, NA, 3), colB = c('A', 'B', 'C', 'D'))

is.na(df)

Output:

      colA  colB
[1,] FALSE FALSE
[2,] FALSE FALSE
[3,]  TRUE FALSE
[4,] FALSE FALSE

Update, if you only want to see the rows containing NA:

> df[rowSums(is.na(df)) > 0,]

  colA colB
3   NA    C

Update2, or to get only ColNames with information about NA (thanks to RSale for anyNA):

> lapply(df, anyNA)
$colA
[1] TRUE

$colB
[1] FALSE
like image 1
Marco_CH Avatar answered Oct 29 '22 03:10

Marco_CH