Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Repeat value until new value appears by group, only once first non-NA value appears

I'm looking to repeat values until a new value appears by group. I have a function that I found online a while back that almost does what I am looking for, but not quite. Here is that function:

    repeat.before <- function(x) {
  ind <- which(!is.na(x))
  ind_rep <- ind
  if (is.na(x[1])) {
    ind_rep <- c(min(ind), ind)
    ind <- c(1, ind)
  }
  rep(x[ind_rep], times = diff(c(ind, length(x) + 1)))
}

This function will successfully repeat the value until a new value appears, by group. The problem is, if the column starts with an NA, the following rows that exist before the first value, will end up taking that first value, instead of remaining NA. I'll illustrate what I mean with this example:

    group    location 
    A        NA
    A        NA
    A        New York
    A        NA
    A        NA
    B        Chicago
    B        NA
    B        Philly
    B        NA

The code above will output this:

    group    location 
    A        New York
    A        New York
    A        New York
    A        New York
    A        New York
    B        Chicago
    B        Chicago
    B        Philly
    B        Philly

Which, again, is very close to what I'm looking for, but not quite. This is the output I am seeking:

    group    location 
    A        NA
    A        NA
    A        New York
    A        New York
    A        New York
    B        Chicago
    B        Chicago
    B        Philly
    B        Philly

Basically, I don't want the "repeat" code to start working until it finds its first value. Until it does that, I'd like for the rows to stay NA. The purpose is so that rows don't get miscategorized, i.e. in the example above, the first two A rows should not be labelled New York.

like image 810
Jared Annibale Avatar asked Dec 07 '22 12:12

Jared Annibale


2 Answers

One option is fill after grouping by 'group'. Use the fill with .direction specified as 'up' or 'down' (default option). Here, we need only 'down' option based on the expected output

library(dplyr)
library(tidyr)
df1 %>%
  group_by(group) %>%
  fill(location) 
# A tibble: 9 x 2
# Groups:   group [2]
#  group location
#  <chr> <chr>   
#1 A     <NA>
#2 A     <NA>
#3 A     New York
#4 A     New York
#5 A     New York
#6 B     Chicago 
#7 B     Chicago 
#8 B     Philly  
#9 B     Philly  

data

df1 <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B", 
 "B"), location = c(NA, NA, "New York", NA, NA, "Chicago", NA, 
 "Philly", NA)), class = "data.frame", row.names = c(NA, -9L))
like image 123
akrun Avatar answered May 16 '23 02:05

akrun


You could also use the zoo package using na.locf function.

library(zoo)
df1 <-
  structure(list(
    group = c("A", "A", "A", "A", "A", "B", "B", "B",
              "B"),
    location = c(NA, NA, "New York", NA, NA, "Chicago", NA,
                 "Philly", NA)
  ),
  class = "data.frame",
  row.names = c(NA,-9L))

df1$location2 <- na.locf(df1$location, na.rm = F)
df1

  group location location2
1     A     <NA>      <NA>
2     A     <NA>      <NA>
3     A New York  New York
4     A     <NA>  New York
5     A     <NA>  New York
6     B  Chicago   Chicago
7     B     <NA>   Chicago
8     B   Philly    Philly
9     B     <NA>    Philly
like image 41
CT Hall Avatar answered May 16 '23 04:05

CT Hall