Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle with empty dataframes in R?

Tags:

dataframe

r

I noticed that sometimes I get errors in my R scripts when I forget checking whether the dataframe I'm working on is actually empty (has zero rows).

For example, when I used apply like this

apply(X=DF,MARGIN=1,FUN=function(row) !any(vec[ row[["start"]]:row[["end"]] ]))

and DF happened to be empty, I got an error about the subscripts.

Why is that? Aren't empty dataframes valid? Why does apply() with MARGIN=1 even try to do anything when there are no rows in the dataframe? Do I really need to add a condition before each such apply to make sure the dataframe isn't empty?

Thank you!

like image 261
David B Avatar asked Feb 26 '23 07:02

David B


2 Answers

On a side note: apply always accesses the function you use at least once. If the input is a dataframe without any rows but with defined variables, it sends "FALSE" as an argument to the function. If the dataframe is completely empty, it sends a logical(0) to the function.

> x <- data.frame(a=numeric(0))
> str(x)
'data.frame':   0 obs. of  1 variable:
 $ a: num 

> y <- apply(x,MARGIN=1,FUN=function(x){print(x)})
[1] FALSE

> x <- data.frame()

> str(x)
'data.frame':   0 obs. of  0 variables

> y <- apply(x,MARGIN=1,FUN=function(x){print(x)})
logical(0)

So as Joshua already told you, either control before the apply whether the dataframe has rows, or add a condition in the function within the apply.

EDIT : This means you should take into account that length(x)==0 is not a very good check, you need to check whether either length(x==0) or !x is TRUE if both possibilities could arise : (Code taken from Joshua)

apply(X=data.frame(),MARGIN=1,  # empty data.frame
  FUN=function(row) {
    if(length(row)==0 || !row) {return()}
    !any(vec[ row[["start"]]:row[["end"]] ])
  })
like image 113
Joris Meys Avatar answered Feb 28 '23 21:02

Joris Meys


This has absolutely nothing to do with apply. The function you are applying does not work when the data.frame is empty.

> myFUN <- function(row) !any(vec[ row[["start"]]:row[["end"]] ])
> myFUN(DF[1,])  # non-empty data.frame
[1] FALSE
> myFUN(data.frame()[1,])  # empty data.frame
Error in row[["start"]]:row[["end"]] : argument of length 0

Add a condition to your function.

> apply(X=data.frame(),MARGIN=1,  # empty data.frame
+  FUN=function(row) {
+    if(length(row)==0) return()
+    !any(vec[ row[["start"]]:row[["end"]] ])
+  })
NULL
like image 25
Joshua Ulrich Avatar answered Feb 28 '23 20:02

Joshua Ulrich