Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove leading NAs to align data

Tags:

dataframe

r

I have a large data.frame with 'staggered' data and would like to align it. What I mean is I would like to take something like

enter image description here

and remove the leading (top) NAs from all columns to get

enter image description here

I know about the na.trim function from the zoo package, but this didn't work on either the initial data.frame presented above or its transpose. For this I used, with transposed dataframe t.df,

t.df <- na.trim(t.df, sides = 'left')

This only returned an empty data.frame, and wouldn't work the way I wanted anyway since it would create vectors of different lengths. Can anyone point me to a package or function that might be more helpful?

Here is the code for my example used above:

# example of what I have

var1 <- c(1,2,3,4,5,6,7,8,9,10)
var2 <- c(6,2,4,7,3,NA,NA,NA,NA,NA)
var3 <- c(NA,NA,8,6,3,7,NA,NA,NA,NA)
var4 <- c(NA,NA,NA,NA,5,NA,2,6,2,9)

df <- data.frame(var1, var2, var3, var4)


# transpose and (unsuccessful) attempt to remove leading NAs

t.df <- t(df)

t.df <-  na.trim(t.df, sides = 'left')
like image 966
ndem763 Avatar asked Apr 15 '16 06:04

ndem763


2 Answers

We can loop over the columns (lapply(..) and apply na.trim. Then, pad NAs at the end of the each of the list elements by assigning length as the maximum length from the list elements.

library(zoo)
lst <- lapply(df, na.trim)
df[] <- lapply(lst, `length<-`, max(lengths(lst)))
df
#   var1 var2 var3 var4
#1     1    6    8    5
#2     2    2    6   NA
##     3    4    3    2
#4     4    7    7    6
#5     5    3   NA    2
#6     6   NA   NA    9
#7     7   NA   NA   NA
#8     8   NA   NA   NA
#9     9   NA   NA   NA
#10   10   NA   NA   NA

Or as @G.Grothendieck mentioned in the comments

replace(df, TRUE, do.call("merge", lapply(lst, zoo)))
like image 148
akrun Avatar answered Sep 23 '22 23:09

akrun


You can do with base functions:

my.na.trim <- function(x) {
  r <- rle(is.na(x))
  if (!r$value[1]) return(x)
  x[c(((r$length[1]+1):length(x)), 1:r$length[1])]
}

df[,] <- lapply(df, my.na.trim)
df
#    var1 var2 var3 var4
# 1     1    6    8    5
# 2     2    2    6   NA
# 3     3    4    3    2
# 4     4    7    7    6
# 5     5    3   NA    2
# 6     6   NA   NA    9
# 7     7   NA   NA   NA
# 8     8   NA   NA   NA
# 9     9   NA   NA   NA
# 10   10   NA   NA   NA

alternative coding for the function:

my.na.trim <- function(x) {
  r <- rle(is.na(x))
  if (!r$value[1]) return(x)
  r1 <- r$length[1]
  c(tail(x, -r1), head(x, r1))
}
like image 28
jogo Avatar answered Sep 20 '22 23:09

jogo