Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating an index/numeral sequence for subsequent N/As in a data frame

Tags:

dataframe

r

I have a column in a data frame (here named "a") where starts of an sequence are marked with 1, while subsequent incidents, belonging to the same sequence are marked with N/A. Now I would like to create a new column ("b") to index all incidents belonging to the same sequence (1:n) and then create a third column ("c") with numbers indicating which incidents belong to the same sequence.

I am sure the solution is very easy and striking once I see it, however, at the moment I just don't manage to come up with an idea myself how to best solve this. Also other questions did not cover my question, as far as I have seen.

Usually I am using dplyr (I also need to do some group_by with my data, which in reality is more complex than I outlined here), so I would be very happy about a dplyr solution if possible!

Example code to start with:

df <- data.frame("a"= c(1, NA, NA, NA, 1, NA, 1, 1, 1))

How it should look like in the end:

df_final <- data.frame("a"= c(1, NA, NA, NA, 1, NA, 1, 1, 1), "b"= c(1, 2, 3, 4, 1, 2, 1, 1, 1), "c" = c(1, 1, 1, 1, 2, 2, 3, 4, 5))
like image 263
Ruebenkraut Avatar asked Nov 25 '25 00:11

Ruebenkraut


1 Answers

EDIT

Since the question has changed now, getting expected output is more simple now

library(dplyr)
df %>%
  group_by(c = cumsum(!is.na(a))) %>%
  mutate(b = row_number())

#     a     c     b
#  <dbl> <int> <int>
#1     1     1     1
#2    NA     1     2
#3    NA     1     3
#4    NA     1     4
#5     1     2     1
#6    NA     2     2
#7     1     3     1
#8     1     4     1
#9     1     5     1

And using base R that would be :

df$c <- cumsum(!is.na(df$a))
df$b <- with(df, ave(a, c, FUN  = seq_along))

Original Answer

Unfortunately, the grouping for creation of b and c is different. For b we group_by sequential non-NA values and take cumulative over them and then generate a row_number for every group. For c we take rle on non-NA values and repeat the group values lengths times.

library(dplyr)

df %>%
  group_by(group = cumsum(!is.na(a))) %>%
  mutate(b = row_number()) %>%
  ungroup() %>%
  select(-group) %>%
  mutate(c = with(rle(!is.na(a)), rep(cumsum(values), lengths)))

# A tibble: 9 x 3
#      a     b     c
#   <dbl> <int> <int>
#1     1     1     1
#2    NA     2     1
#3    NA     3     1
#4    NA     4     1
#5     1     1     2
#6    NA     2     2
#7     1     1     3
#8     1     1     3
#9     1     1     3

Of course this is not dplyr specific answer and can be answered with base R as well

df$b <- with(df, ave(a, cumsum(!is.na(a)), FUN  = seq_along))
df$c <- with(df, with(rle(!is.na(a)), rep(cumsum(values), lengths)))
like image 104
Ronak Shah Avatar answered Nov 27 '25 15:11

Ronak Shah



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!