Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count NAs between first and last occured numbers

Here is my toy dataset

df <- tribble(
  ~x, ~y, ~z,
  7,   NA, 4,
  8,   2,  NA,
  NA,  NA, NA,
  NA,  4,  6)

I want to get a dataframe with a number of NAs for each variable only between the first and the last occurrence of numbers in each column and number of NAs between the first occurred number and last row. So, for this example, the desired solution is

desired_df <- tribble(~vars, ~na_count_between_1st_last_num, ~na_count_between_1st_num_last_row,
                       "x",     0,                              2,
                       "y",     1,                              1,
                       "z",     2,                              2)

How can I get the desired output?

like image 491
Geet Avatar asked Feb 12 '20 13:02

Geet


3 Answers

Here is an idea via base R,

f1 <- function(x) {i1 <- which(!is.na(x)); head(i1, 1):tail(i1, 1) }
f2 <- function(x) {i1 <- which(!is.na(x)); head(i1, 1):length(x) }

merge(stack(sapply(df, function(i) sum(is.na(i[f1(i)])))), 
      stack(sapply(df, function(i) sum(is.na(i[f2(i)])))), by = 'ind')

#  ind values.x values.y
#1   x        0        2
#2   y        1        1
#3   z        2        2
like image 57
Sotos Avatar answered Oct 18 '22 05:10

Sotos


na.trim trims NAs off both ends or just the left or right end if we specify sides="left" or sides="right" so:

library(dplyr)
library(tibble)
library(tidyr)
library(zoo)

df %>%
  pivot_longer(everything()) %>%
  group_by(name) %>%
  summarize(na1 = sum(is.na(na.trim(value))), 
            na2 = sum(is.na(na.trim(value, "left")))) %>%
  ungroup

giving:

# A tibble: 3 x 3
  name    na1   na2
  <chr> <int> <int>
1 x         0     2
2 y         1     1
3 z         2     2
like image 4
G. Grothendieck Avatar answered Oct 18 '22 06:10

G. Grothendieck


Here is one possibility using two functions:

fun1 <- function(x) { #count NA between first and last non NA
  idx1 <- cumsum(!is.na(x)) > 0 #identify leading NA
  idx2 <- rev(cumsum(!is.na(rev(x))) > 0) #identify trailing NA
  sum(is.na(x[idx1 & idx2]))
}


fun2 <- function(x) {#count NA between first non-NA and last element
  idx1 <- cumsum(!is.na(x)) > 0 #identify leading NA
  sum(is.na(x[idx1]))
}

Afterwards you just summarise your data.frame and reshape it:

df %>% summarise_all(list(m1 = ~fun1(.), m2 = ~fun2(.))) %>%
  pivot_longer(cols = everything(), names_pattern = "^(.)_(.*)$", names_to = c("vars", "a"),
               values_to = "x") %>%
  spread(a, x)

# A tibble: 3 x 3
  vars     m1    m2
  <chr> <int> <int>
1 x         0     2
2 y         1     1
3 z         2     2
like image 3
Cettt Avatar answered Oct 18 '22 05:10

Cettt