I have a data frame and I would like to test really fast if it is empty or not. I know that there are either no rows or there are integers (no missing values). So far, I have tested five different options (see below). Does anyone have even faster solution?
df <- data.frame(a = integer(0), b = integer(0), c = integer(0))
fa <- function(){
nrow(df) > 0
}
fb <- function(){
any(dim(df)[1L])
}
fc <- function(){
(dim(df)[1L]) != 0
}
fd <- function() {
any(.subset2(df, 1)[1])
}
fe <- function() {
any(.subset2(df, 1))
}
library(microbenchmark)
microbenchmark(fa(), fb(), fc(), fd(), fe(), times = 1000)
And results:
> microbenchmark(fa(), fb(), fc(), fd(), fe(), times = 1000)
Unit: nanoseconds
expr min lq mean median uq max neval cld
fa() 5664 6725 8672.462 6725 11680 47777 1000 cd
fb() 6017 7078 8979.645 7079 12034 58041 1000 d
fc() 6017 6372 8492.680 6725 11679 25127 1000 c
fd() 1062 1770 2214.170 1771 2832 14511 1000 b
fe() 354 1062 1359.498 1063 1770 12741 1000 a
You can use the attribute df. empty to check whether it's empty or not: if df. empty: print('DataFrame is empty!
empty. True if NDFrame is entirely empty [no items], meaning any of the axes are of length 0. If NDFrame contains only NaNs, it is still not considered empty.
To check if DataFrame is empty in Pandas, use pandas. DataFrame. empty attribute. This attribute returns a boolean value of true if this DataFrame is empty, or false if this DataFrame is not empty.
Use DataFrame. isnull(). Values. any() method to check if there are any missing data in pandas DataFrame, missing data is represented as NaN or None values in DataFrame.
Since most of the objects you tests aren't likely to be empty, you should be more concerned about the timing of your functions on a non-empty data.frame. You should also compile them to get a sense for how they would perform in a package.
library(microbenchmark)
library(compiler)
fa <- cmpfun({function(){
nrow(df) > 0L
}})
fb <- cmpfun({function(){
any(dim(df)[1L])
}})
fc <- cmpfun({function(){
dim(df)[1L] != 0L
}})
fd <- cmpfun({function() {
any(.subset2(df, 1L)[1L])
}})
fe <- cmpfun({function() {
any(.subset2(df, 1L))
}})
ff <- cmpfun({function() {
length(.subset2(df, 1L)) > 0L
}})
fg <- cmpfun({function() {
as.logical(length(.subset2(df, 1L)))
}})
The test on an empty data.frame shows all methods are roughly the same.
df <- data.frame(a = integer(0), b = integer(0), c = integer(0))
microbenchmark(fa(), fb(), fc(), fd(), fe(), ff(), fg(), times = 1000)
# Unit: nanoseconds
# expr min lq median uq max neval
# fa() 5685 5969.0 6165.0 6608.5 20515 1000
# fb() 6147 6443.0 6651.0 7214.0 18117 1000
# fc() 5726 5984.0 6152.0 6457.5 38404 1000
# fd() 1210 1411.0 1573.0 1764.5 4933 1000
# fe() 635 871.0 1003.0 1105.5 10225 1000
# ff() 513 727.5 861.5 941.0 5691 1000
# fg() 681 868.5 981.5 1080.0 2982 1000
The test on a non-empty data.frame shows that one of the functions is a really bad performer, while the rest are roughly the same.
df <- data.frame(a = integer(1e6), b = integer(1e6), c = integer(1e6))
microbenchmark(fa(), fb(), fc(), fd(), fe(), ff(), fg(), times = 1000)
# Unit: nanoseconds
# expr min lq median uq max neval
# fa() 6569 7142 8782.0 12364.5 46749 1000
# fb() 7034 7682 9334.5 18334.0 53172 1000
# fc() 6539 7110 8453.5 20585.5 49912 1000
# fd() 1171 1585 2507.5 5021.5 17641 1000
# fe() 4340209 4413042 4460973.5 5468688.5 26045766 1000
# ff() 637 984 1489.0 3646.5 14212 1000
# fg() 767 1161 2401.0 4078.5 236958 1000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With