R: Repeat value until new value appears by group, only once first non-NA value appears

Question

I'm looking to repeat values until a new value appears by group. I have a function that I found online a while back that almost does what I am looking for, but not quite. Here is that function:

    repeat.before <- function(x) {
  ind <- which(!is.na(x))
  ind_rep <- ind
  if (is.na(x[1])) {
    ind_rep <- c(min(ind), ind)
    ind <- c(1, ind)
  }
  rep(x[ind_rep], times = diff(c(ind, length(x) + 1)))
}

This function will successfully repeat the value until a new value appears, by group. The problem is, if the column starts with an NA, the following rows that exist before the first value, will end up taking that first value, instead of remaining NA. I'll illustrate what I mean with this example:

    group    location 
    A        NA
    A        NA
    A        New York
    A        NA
    A        NA
    B        Chicago
    B        NA
    B        Philly
    B        NA

The code above will output this:

    group    location 
    A        New York
    A        New York
    A        New York
    A        New York
    A        New York
    B        Chicago
    B        Chicago
    B        Philly
    B        Philly

Which, again, is very close to what I'm looking for, but not quite. This is the output I am seeking:

    group    location 
    A        NA
    A        NA
    A        New York
    A        New York
    A        New York
    B        Chicago
    B        Chicago
    B        Philly
    B        Philly

Basically, I don't want the "repeat" code to start working until it finds its first value. Until it does that, I'd like for the rows to stay NA. The purpose is so that rows don't get miscategorized, i.e. in the example above, the first two A rows should not be labelled New York.

akrun · Accepted Answer

One option is fill after grouping by 'group'. Use the fill with .direction specified as 'up' or 'down' (default option). Here, we need only 'down' option based on the expected output

library(dplyr)
library(tidyr)
df1 %>%
  group_by(group) %>%
  fill(location) 
# A tibble: 9 x 2
# Groups:   group [2]
#  group location
#  <chr> <chr>   
#1 A     <NA>
#2 A     <NA>
#3 A     New York
#4 A     New York
#5 A     New York
#6 B     Chicago 
#7 B     Chicago 
#8 B     Philly  
#9 B     Philly

data

df1 <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B", 
 "B"), location = c(NA, NA, "New York", NA, NA, "Chicago", NA, 
 "Philly", NA)), class = "data.frame", row.names = c(NA, -9L))

CT Hall · Answer

You could also use the zoo package using na.locf function.

library(zoo)
df1 <-
  structure(list(
    group = c("A", "A", "A", "A", "A", "B", "B", "B",
              "B"),
    location = c(NA, NA, "New York", NA, NA, "Chicago", NA,
                 "Philly", NA)
  ),
  class = "data.frame",
  row.names = c(NA,-9L))

df1$location2 <- na.locf(df1$location, na.rm = F)
df1

  group location location2
1     A     <NA>      <NA>
2     A     <NA>      <NA>
3     A New York  New York
4     A     <NA>  New York
5     A     <NA>  New York
6     B  Chicago   Chicago
7     B     <NA>   Chicago
8     B   Philly    Philly
9     B     <NA>    Philly

R: Repeat value until new value appears by group, only once first non-NA value appears

Tags:

function

text

r

repeat

grouping

Jared Annibale

2 Answers

data

akrun

CT Hall

Recent Activity

Donate For Us

R: Repeat value until new value appears by group, only once first non-NA value appears

Tags:

function

text

r

repeat

grouping

Jared Annibale

2 Answers

data

akrun

CT Hall

Related questions

Recent Activity

Donate For Us