I have a data frame like this:
df <- data.frame(id = c("A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B"),
var1 = c("100", "200", "300", NA, NA, NA, NA, NA,
"100", "200", "300", NA, NA, NA),
var2 = c("100", NA, NA, "400", "500", "600", NA, NA,
NA, NA, NA, "400", NA, NA),
var3 = c("200", NA, NA, NA, NA, NA, "700", "800",
"500", NA, NA, NA, "500", "600"))
which looks like this:
id var1 var2 var3
A 100 100 200
A 200 <NA> <NA>
A 300 <NA> <NA>
A <NA> 400 <NA>
A <NA> 500 <NA>
A <NA> 600 <NA>
A <NA> <NA> 700
A <NA> <NA> 800
B 100 <NA> 500
B 200 <NA> <NA>
B 300 <NA> <NA>
B <NA> 400 <NA>
B <NA> <NA> 500
B <NA> <NA> 600
I would like to shift values in columns up if there are missing values above (by group). The result should look like this:
id var1 var2 var3
A 100 100 200
A 200 400 700
A 300 500 800
A <NA> 600 <NA>
B 100 400 500
B 200 <NA> 500
B 300 <NA> 600
I have no idea how to do this. Any thoughts?
Here is a rough concept using data.table
that can be refined:
library(data.table)
# Helper function:
shift_up <- function(x) {
n <- length(x)
x <- x[!is.na(x)]
length(x) <- n
x
}
setDT(df)
df[, lapply(.SD, shift_up), id][!(is.na(var1) & is.na(var2) & is.na(var3))]
id var1 var2 var3
1: A 100 100 200
2: A 200 400 700
3: A 300 500 800
4: A <NA> 600 <NA>
5: B 100 400 500
6: B 200 <NA> 500
7: B 300 <NA> 600
Don't think this is the most efficient way to do it but one option
library(rowr)
df1 <- do.call(rbind, lapply(split(df, df$id), function(x) {
data.frame(id = x$id[1], do.call(cbind.fill,c(sapply(x[-1], na.omit),fill = NA)))
}))
names(df1) <- names(df)
df1
# id var1 var2 var3
#A.1 A 100 100 200
#A.2 A 200 400 700
#A.3 A 300 500 800
#A.4 A <NA> 600 <NA>
#B.1 B 100 400 500
#B.2 B 200 <NA> 500
#B.3 B 300 <NA> 600
We split
the dataframe into list of dataframe for every id
and for each dataframe we remove the NA
values using na.omit
and use cbind.fill
to fill the values with NA
and finally merge the list of dataframes back into one using rbind
with do.call
.
Here is an option with data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'id', order
the other column based on the NA values, then create an index to remove the rows where all the elements are NA
library(data.table)
df1 <- setDT(df)[, lapply(.SD, function(x) x[order(is.na(x))]), id]
df1[df1[,!Reduce(`&`, lapply(.SD, is.na)), .SDcols = var1:var3]]
# id var1 var2 var3
#1: A 100 100 200
#2: A 200 400 700
#3: A 300 500 800
#4: A <NA> 600 <NA>
#5: B 100 400 500
#6: B 200 <NA> 500
#7: B 300 <NA> 600
Or using the same logic with tidyverse
. Grouped by 'id', change the order
or elements in all other column with mutate_all
by order
ing on the logical vector (is.na(column)
) and keep the rows having at least one non-NA (filter_at
)
library(tidyverse)
df %>%
group_by(id) %>%
mutate_all(funs(.[order(is.na(.))])) %>%
filter_at(vars(var1:var3), any_vars(!is.na(.)))
# A tibble: 7 x 4
# Groups: id [2]
# id var1 var2 var3
# <fct> <fct> <fct> <fct>
#1 A 100 100 200
#2 A 200 400 700
#3 A 300 500 800
#4 A <NA> 600 <NA>
#5 B 100 400 500
#6 B 200 <NA> 500
#7 B 300 <NA> 600
Ordering a vector/column based on logical indexing is simple.
v1 <- c(1:3, NA, 5, NA, 7)
order(is.na(v1)) #gives the index of order
#[1] 1 2 3 5 7 4 6
use that index to change the order of values
v1[order(is.na(v1))]
#[1] 1 2 3 5 7 NA NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With