How to replace a specific sequence of numbers (per row) with another sequence in a big data frame in R?

Tags:

I have a data.frame with absence/presence data (0/1) for a group of animals, with columns as years and rows as individuals.

My data:

df <- data.frame(Year1 = c('1','0','0','0','0','0'),
                 Year2 = c('1','1','1','0','0','0'),
                 Year3 = c('1','1','1','1','1','0'),
                 Year4 = c('0','1','0','0','0','1'),
                 Year5 = c('0','0','1','1','0','1'),
                 Year6 = c('0','0','0','1','1','1'))

df
     Year1 Year2 Year3 Year4 Year5 Year6
1:     1     1     1     0     0     0
2:     0     1     1     1     0     0
3:     0     1     1     0     1     0
4:     0     0     1     0     1     1
5:     0     0     1     0     0     1
6:     0     0     0     1     1     1

Some individuals have sighting gaps (seen one year (1), then not seen the next year (0), but spotted again in the third year (1)). In total there are 400 rows (=individuals).

What I would like to do is fill the gaps (0s between 1s) with 1s, so that the above data frame becomes:

df
     Year1 Year2 Year3 Year4 Year5 Year6
1:     1     1     1     0     0     0
2:     0     1     1     1     0     0
3:     0     1     1     1     1     0
4:     0     0     1     1     1     1
5:     0     0     1     1     1     1
6:     0     0     0     1     1     1

Zeros before the first 1 and after the last 1 should not be affected.

I have browsed many stackoverflow questions, e.g.:

find and replace numeric sequence in r

Replace a sequence of values by group depending on preceeding values

However, I could not find a solution that works across all columns at once, on a row-by-row basis.

Thank you in advance for your advice! :)

948

asked Oct 10 '21 22:10

Video Answer

2 Answers

Use max.col to find the "first" and "last" 1 in each row, and then compare to the col()umn number:

df[col(df) >= max.col(df, "first") & col(df) <= max.col(df, "last")] <- 1
df

#  Year1 Year2 Year3 Year4 Year5 Year6
#1     1     1     1     0     0     0
#2     0     1     1     1     0     0
#3     0     1     1     1     1     0
#4     0     0     1     1     1     1
#5     0     0     1     1     1     1
#6     0     0     0     1     1     1

answered Oct 20 '22 04:10

We may do this by row. An efficient option is using dapply from collapse. Loop over the rows, find the position index of 1s, get the sequence between the first and last, and replace those elements to 1.

library(collapse)
dapply(df, MARGIN = 1, FUN = function(x)
     replace(x,  do.call(`:`, as.list(range(which(x == 1)))),  1 ))

-output

  Year1 Year2 Year3 Year4 Year5 Year6
1     1     1     1     0     0     0
2     0     1     1     1     0     0
3     0     1     1     1     1     0
4     0     0     1     1     1     1
5     0     0     1     1     1     1
6     0     0     0     1     1     1

An option is also to get the row/column index with which and arr.ind = TRUE, then create the sequence, and use the row/column index to do the assignment which is vectorized

ind <- which(df ==1, arr.ind = TRUE)
m1 <- as.matrix(transform(stack(lapply(split(ind[,2], ind[,1]), 
   function(x) x[1]:x[length(x)]))[2:1], ind = as.integer(ind)))
df[m1] <- 1

answered Oct 20 '22 04:10

akrun

Related questions
                            
                                Reconvert numeric date to POSIXct R
                            
                                How to get quantiles to work with summarise_at and group_by (dplyr)
                            
                                R: Force regression coefficients to add up to 1
                            
                                translate this loop into purr?
                            
                                Rails 6.0 action text couldn't find file 'trix/dist/trix' with type 'text/css'
                            
                                How to convert scientific notation to decimal in tibbles?
                            
                                Emulating reshape2::melt with pivot_longer for matrixes
                            
                                How to dodge overlapping segments to keep them parallel
                            
                                Looking for a dplyr function to apply a filter conditionally
                            
                                R use mapply on nested list
                            
                                How to avoid excessive lambda functions in pandas DataFrame assign and apply method chains
                            
                                R: How to identify unknown number of combinations?
                            
                                tidyverse summarize multiple columns but show result as rows
                            
                                Package ‘stringr’ was installed before R 4.0.0: please re-install it BiocManager Installation path not writeable, unable to update packages
                            
                                do.call doesn't work with "+" as "what" and a list of 3+ elements
                            
                                How to group rows in a range and consider a 3rd column?
                            
                                Add main title multiple plots ggarange
                            
                                Is there an R function for counting the occurrence of a given substring within a string?
                            
                                Getting the Error: Graphics API version mismatch
                            
                                Plot two barplot in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to replace a specific sequence of numbers (per row) with another sequence in a big data frame in R?

Tags:

replace

r

data.table

sequence

gaps-in-data

notiomystiscincta

People also ask

Video Answer

2 Answers

thelatemail

akrun

Recent Activity

Donate For Us