I have such a data frame:
df <- structure(list(a = c(NA, NA, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L), b = c(NA, NA, NA, 1L, 2L, 3L, 4L, 5L, 6L, 7L), d = c(NA, NA, NA, NA, 1L, 2L, 3L, 4L, 5L, 6L)), .Names = c("a", "b", "d"), row.names = c(NA, -10L), class = "data.frame")
> df
a b d
1 NA NA NA
2 NA NA NA
3 1 NA NA
4 2 1 NA
5 3 2 1
6 4 3 2
7 5 4 3
8 6 5 4
9 7 6 5
10 8 7 6
In each column, I'd like to move the non-NA
values up to the start, and move the NA
s to the end:
> df.out
a b d
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 NA
8 8 NA NA
9 NA NA NA
10 NA NA NA
Update to make my questions clearer..
df <- structure(list(a = c(NA, NA, 1, 5, 34, 7, 3, 5, 8, 4), b = c(NA,
NA, NA, 57, 2, 7, 9, 5, 12, 100), d = c(NA, NA, NA, NA, 5, 7,
2, 8, 2, 5)), .Names = c("a", "b", "d"), row.names = c(NA, -10L
), class = "data.frame")
> df
a b d
1 NA NA NA
2 NA NA NA
3 1 NA NA
4 5 57 NA
5 34 2 5
6 7 7 7
7 3 9 2
8 5 5 8
9 8 12 2
10 4 100 5
should result in:
a b d
1 1 57 5
2 5 2 7
3 34 7 2
4 7 9 8
5 3 5 2
6 5 12 5
7 8 100 NA
8 4 NA NA
9 NA NA NA
10 NA NA NA
Seems like an easy task but I am stuck on where to start.. Can you help?
If we need to drop such columns that contain NA, we can use the axis=column s parameter of DataFrame. dropna() to specify deleting the columns. By default, it removes the column where one or more values are missing.
Another solution using lapply
(without sorting/reordering the data- per your comments)
df[] <- lapply(df, function(x) c(x[!is.na(x)], x[is.na(x)]))
df
# a b d
# 1 1 57 5
# 2 5 2 7
# 3 34 7 2
# 4 7 9 8
# 5 3 5 2
# 6 5 12 5
# 7 8 100 NA
# 8 4 NA NA
# 9 NA NA NA
# 10 NA NA NA
Or using data.table
in order to update df
by reference, rather than creating a copy of it (that solution won't sort your data neither)
library(data.table)
setDT(df)[, names(df) := lapply(.SD, function(x) c(x[!is.na(x)], x[is.na(x)]))]
df
# a b d
# 1: 1 57 5
# 2: 5 2 7
# 3: 34 7 2
# 4: 7 9 8
# 5: 3 5 2
# 6: 5 12 5
# 7: 8 100 NA
# 8: 4 NA NA
# 9: NA NA NA
# 10: NA NA NA
Some benchmarks reveal the base solution is the fastest by far:
library("microbenchmark")
david <- function() lapply(df, function(x) c(x[!is.na(x)], x[is.na(x)]))
dt <- setDT(df)
david.dt <- function() dt[, names(dt) := lapply(.SD, function(x) c(x[!is.na(x)], x[is.na(x)]))]
microbenchmark(as.data.frame(lapply(df, beetroot)), david(), david.dt())
# Unit: microseconds
# expr min lq median uq max neval
# as.data.frame(lapply(df, beetroot)) 1145.224 1215.253 1274.417 1334.7870 4028.507 100
# david() 116.515 127.382 140.965 149.7185 308.493 100
# david.dt() 3087.335 3247.920 3330.627 3415.1460 6464.447 100
After completely misunderstanding the question, here is my final answer:
# named after beetroot for being the first to ever need this functionality
beetroot <- function(x) {
# count NA
num.na <- sum(is.na(x))
# remove NA
x <- x[!is.na(x)]
# glue the number of NAs at the end
x <- c(x, rep(NA, num.na))
return(x)
}
# apply beetroot over each column in the dataframe
as.data.frame(lapply(df, beetroot))
It will count the NAs, remove the NAs, and glue NAs at the bottom for each column in the data frame.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With