Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace a specific sequence of numbers (per row) with another sequence in a big data frame in R?

I have a data.frame with absence/presence data (0/1) for a group of animals, with columns as years and rows as individuals.

My data:

df <- data.frame(Year1 = c('1','0','0','0','0','0'),
                 Year2 = c('1','1','1','0','0','0'),
                 Year3 = c('1','1','1','1','1','0'),
                 Year4 = c('0','1','0','0','0','1'),
                 Year5 = c('0','0','1','1','0','1'),
                 Year6 = c('0','0','0','1','1','1'))

df
     Year1 Year2 Year3 Year4 Year5 Year6
1:     1     1     1     0     0     0
2:     0     1     1     1     0     0
3:     0     1     1     0     1     0
4:     0     0     1     0     1     1
5:     0     0     1     0     0     1
6:     0     0     0     1     1     1

Some individuals have sighting gaps (seen one year (1), then not seen the next year (0), but spotted again in the third year (1)). In total there are 400 rows (=individuals).

What I would like to do is fill the gaps (0s between 1s) with 1s, so that the above data frame becomes:

df
     Year1 Year2 Year3 Year4 Year5 Year6
1:     1     1     1     0     0     0
2:     0     1     1     1     0     0
3:     0     1     1     1     1     0
4:     0     0     1     1     1     1
5:     0     0     1     1     1     1
6:     0     0     0     1     1     1

Zeros before the first 1 and after the last 1 should not be affected.

I have browsed many stackoverflow questions, e.g.:

find and replace numeric sequence in r

Replace a sequence of values by group depending on preceeding values

However, I could not find a solution that works across all columns at once, on a row-by-row basis.

Thank you in advance for your advice! :)

like image 948
notiomystiscincta Avatar asked Oct 10 '21 22:10

notiomystiscincta


People also ask

How to select certain rows in a table in R?

By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.

How do you add sequential numbers in R?

The simplest way to create a sequence of numbers in R is by using the : operator. Type 1:20 to see how it works. That gave us every integer between (and including) 1 and 20 (an integer is a positive or negative counting number, including 0).

How to generate sequences of the same numbers in R?

Third, we will have a look at how we can use the rep () function to generate e.g. sequences of the same numbers or a few numbers. The absolutely simplest way to create a sequence of numbers in R is by using the : operator. Here’s how to create a sequence of numbers, from 1 to 10:

Are the row numbers in resultdf in a sequence?

But the row numbers are not in a sequence. We need the rows of resultDF to be numbered in sequence without missing any numbers. We will set the rownames with a sequence of numbers with a length equal to number of rows in the dataframe.

How do you get more control over a sequence in R?

To gain more control you can use the seq () method. How do you Repeat a Sequence of Numbers in R? To repeat a sequence of numbers in R you can use the rep () function. For example, if you type rep (1:5, times=5) you will get a vector with the sequence 1 to 5 repeated 5 times.

Are the row numbers in the filtered output Dataframe in sequence?

The second and third rows are trashed out and only rows numbered one and four got into the filtered output dataframe. But the row numbers are not in a sequence.


Video Answer


2 Answers

Use max.col to find the "first" and "last" 1 in each row, and then compare to the col()umn number:

df[col(df) >= max.col(df, "first") & col(df) <= max.col(df, "last")] <- 1
df

#  Year1 Year2 Year3 Year4 Year5 Year6
#1     1     1     1     0     0     0
#2     0     1     1     1     0     0
#3     0     1     1     1     1     0
#4     0     0     1     1     1     1
#5     0     0     1     1     1     1
#6     0     0     0     1     1     1
like image 60
thelatemail Avatar answered Oct 20 '22 04:10

thelatemail


We may do this by row. An efficient option is using dapply from collapse. Loop over the rows, find the position index of 1s, get the sequence between the first and last, and replace those elements to 1.

library(collapse)
dapply(df, MARGIN = 1, FUN = function(x)
     replace(x,  do.call(`:`, as.list(range(which(x == 1)))),  1 ))

-output

  Year1 Year2 Year3 Year4 Year5 Year6
1     1     1     1     0     0     0
2     0     1     1     1     0     0
3     0     1     1     1     1     0
4     0     0     1     1     1     1
5     0     0     1     1     1     1
6     0     0     0     1     1     1

An option is also to get the row/column index with which and arr.ind = TRUE, then create the sequence, and use the row/column index to do the assignment which is vectorized

ind <- which(df ==1, arr.ind = TRUE)
m1 <- as.matrix(transform(stack(lapply(split(ind[,2], ind[,1]), 
   function(x) x[1]:x[length(x)]))[2:1], ind = as.integer(ind)))
df[m1] <- 1
like image 34
akrun Avatar answered Oct 20 '22 04:10

akrun