I'm looking to repeat values until a new value appears by group. I have a function that I found online a while back that almost does what I am looking for, but not quite. Here is that function:
repeat.before <- function(x) {
ind <- which(!is.na(x))
ind_rep <- ind
if (is.na(x[1])) {
ind_rep <- c(min(ind), ind)
ind <- c(1, ind)
}
rep(x[ind_rep], times = diff(c(ind, length(x) + 1)))
}
This function will successfully repeat the value until a new value appears, by group. The problem is, if the column starts with an NA, the following rows that exist before the first value, will end up taking that first value, instead of remaining NA. I'll illustrate what I mean with this example:
group location
A NA
A NA
A New York
A NA
A NA
B Chicago
B NA
B Philly
B NA
The code above will output this:
group location
A New York
A New York
A New York
A New York
A New York
B Chicago
B Chicago
B Philly
B Philly
Which, again, is very close to what I'm looking for, but not quite. This is the output I am seeking:
group location
A NA
A NA
A New York
A New York
A New York
B Chicago
B Chicago
B Philly
B Philly
Basically, I don't want the "repeat" code to start working until it finds its first value. Until it does that, I'd like for the rows to stay NA. The purpose is so that rows don't get miscategorized, i.e. in the example above, the first two A rows should not be labelled New York.
One option is fill
after grouping by 'group'. Use the fill
with .direction
specified as 'up' or 'down' (default option). Here, we need only 'down' option based on the expected output
library(dplyr)
library(tidyr)
df1 %>%
group_by(group) %>%
fill(location)
# A tibble: 9 x 2
# Groups: group [2]
# group location
# <chr> <chr>
#1 A <NA>
#2 A <NA>
#3 A New York
#4 A New York
#5 A New York
#6 B Chicago
#7 B Chicago
#8 B Philly
#9 B Philly
df1 <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B",
"B"), location = c(NA, NA, "New York", NA, NA, "Chicago", NA,
"Philly", NA)), class = "data.frame", row.names = c(NA, -9L))
You could also use the zoo
package using na.locf
function.
library(zoo)
df1 <-
structure(list(
group = c("A", "A", "A", "A", "A", "B", "B", "B",
"B"),
location = c(NA, NA, "New York", NA, NA, "Chicago", NA,
"Philly", NA)
),
class = "data.frame",
row.names = c(NA,-9L))
df1$location2 <- na.locf(df1$location, na.rm = F)
df1
group location location2
1 A <NA> <NA>
2 A <NA> <NA>
3 A New York New York
4 A <NA> New York
5 A <NA> New York
6 B Chicago Chicago
7 B <NA> Chicago
8 B Philly Philly
9 B <NA> Philly
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With