I have a data.frame with absence/presence data (0/1) for a group of animals, with columns as years and rows as individuals.
My data:
df <- data.frame(Year1 = c('1','0','0','0','0','0'),
Year2 = c('1','1','1','0','0','0'),
Year3 = c('1','1','1','1','1','0'),
Year4 = c('0','1','0','0','0','1'),
Year5 = c('0','0','1','1','0','1'),
Year6 = c('0','0','0','1','1','1'))
df
Year1 Year2 Year3 Year4 Year5 Year6
1: 1 1 1 0 0 0
2: 0 1 1 1 0 0
3: 0 1 1 0 1 0
4: 0 0 1 0 1 1
5: 0 0 1 0 0 1
6: 0 0 0 1 1 1
Some individuals have sighting gaps (seen one year (1), then not seen the next year (0), but spotted again in the third year (1)). In total there are 400 rows (=individuals).
What I would like to do is fill the gaps (0s between 1s) with 1s, so that the above data frame becomes:
df
Year1 Year2 Year3 Year4 Year5 Year6
1: 1 1 1 0 0 0
2: 0 1 1 1 0 0
3: 0 1 1 1 1 0
4: 0 0 1 1 1 1
5: 0 0 1 1 1 1
6: 0 0 0 1 1 1
Zeros before the first 1 and after the last 1 should not be affected.
I have browsed many stackoverflow questions, e.g.:
find and replace numeric sequence in r
Replace a sequence of values by group depending on preceeding values
However, I could not find a solution that works across all columns at once, on a row-by-row basis.
Thank you in advance for your advice! :)
By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.
The simplest way to create a sequence of numbers in R is by using the : operator. Type 1:20 to see how it works. That gave us every integer between (and including) 1 and 20 (an integer is a positive or negative counting number, including 0).
Third, we will have a look at how we can use the rep () function to generate e.g. sequences of the same numbers or a few numbers. The absolutely simplest way to create a sequence of numbers in R is by using the : operator. Here’s how to create a sequence of numbers, from 1 to 10:
But the row numbers are not in a sequence. We need the rows of resultDF to be numbered in sequence without missing any numbers. We will set the rownames with a sequence of numbers with a length equal to number of rows in the dataframe.
To gain more control you can use the seq () method. How do you Repeat a Sequence of Numbers in R? To repeat a sequence of numbers in R you can use the rep () function. For example, if you type rep (1:5, times=5) you will get a vector with the sequence 1 to 5 repeated 5 times.
The second and third rows are trashed out and only rows numbered one and four got into the filtered output dataframe. But the row numbers are not in a sequence.
Use max.col to find the "first" and "last" 1 in each row, and then compare to the col()umn number:
df[col(df) >= max.col(df, "first") & col(df) <= max.col(df, "last")] <- 1
df
# Year1 Year2 Year3 Year4 Year5 Year6
#1 1 1 1 0 0 0
#2 0 1 1 1 0 0
#3 0 1 1 1 1 0
#4 0 0 1 1 1 1
#5 0 0 1 1 1 1
#6 0 0 0 1 1 1
We may do this by row. An efficient option is using dapply from collapse. Loop over the rows, find the position index of 1s, get the sequence between the first and last, and replace those elements to 1.
library(collapse)
dapply(df, MARGIN = 1, FUN = function(x)
replace(x, do.call(`:`, as.list(range(which(x == 1)))), 1 ))
-output
Year1 Year2 Year3 Year4 Year5 Year6
1 1 1 1 0 0 0
2 0 1 1 1 0 0
3 0 1 1 1 1 0
4 0 0 1 1 1 1
5 0 0 1 1 1 1
6 0 0 0 1 1 1
An option is also to get the row/column index with which and arr.ind = TRUE, then create the sequence, and use the row/column index to do the assignment which is vectorized
ind <- which(df ==1, arr.ind = TRUE)
m1 <- as.matrix(transform(stack(lapply(split(ind[,2], ind[,1]),
function(x) x[1]:x[length(x)]))[2:1], ind = as.integer(ind)))
df[m1] <- 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With