I have a data.frame with absence/presence data (0/1) for a group of animals, with columns as years and rows as individuals.
My data:
df <- data.frame(Year1 = c('1','0','0','0','0','0'),
Year2 = c('1','1','1','0','0','0'),
Year3 = c('1','1','1','1','1','0'),
Year4 = c('0','1','0','0','0','1'),
Year5 = c('0','0','1','1','0','1'),
Year6 = c('0','0','0','1','1','1'))
df
Year1 Year2 Year3 Year4 Year5 Year6
1: 1 1 1 0 0 0
2: 0 1 1 1 0 0
3: 0 1 1 0 1 0
4: 0 0 1 0 1 1
5: 0 0 1 0 0 1
6: 0 0 0 1 1 1
Some individuals have sighting gaps (seen one year (1), then not seen the next year (0), but spotted again in the third year (1)). In total there are 400 rows (=individuals).
What I would like to do is fill the gaps (0s between 1s) with 1s, so that the above data frame becomes:
df
Year1 Year2 Year3 Year4 Year5 Year6
1: 1 1 1 0 0 0
2: 0 1 1 1 0 0
3: 0 1 1 1 1 0
4: 0 0 1 1 1 1
5: 0 0 1 1 1 1
6: 0 0 0 1 1 1
Zeros before the first 1 and after the last 1 should not be affected.
I have browsed many stackoverflow questions, e.g.:
find and replace numeric sequence in r
Replace a sequence of values by group depending on preceeding values
However, I could not find a solution that works across all columns at once, on a row-by-row basis.
Thank you in advance for your advice! :)
By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.
The simplest way to create a sequence of numbers in R is by using the : operator. Type 1:20 to see how it works. That gave us every integer between (and including) 1 and 20 (an integer is a positive or negative counting number, including 0).
Third, we will have a look at how we can use the rep () function to generate e.g. sequences of the same numbers or a few numbers. The absolutely simplest way to create a sequence of numbers in R is by using the : operator. Here’s how to create a sequence of numbers, from 1 to 10:
But the row numbers are not in a sequence. We need the rows of resultDF to be numbered in sequence without missing any numbers. We will set the rownames with a sequence of numbers with a length equal to number of rows in the dataframe.
To gain more control you can use the seq () method. How do you Repeat a Sequence of Numbers in R? To repeat a sequence of numbers in R you can use the rep () function. For example, if you type rep (1:5, times=5) you will get a vector with the sequence 1 to 5 repeated 5 times.
The second and third rows are trashed out and only rows numbered one and four got into the filtered output dataframe. But the row numbers are not in a sequence.
Use max.col
to find the "first" and "last" 1
in each row, and then compare to the col()
umn number:
df[col(df) >= max.col(df, "first") & col(df) <= max.col(df, "last")] <- 1
df
# Year1 Year2 Year3 Year4 Year5 Year6
#1 1 1 1 0 0 0
#2 0 1 1 1 0 0
#3 0 1 1 1 1 0
#4 0 0 1 1 1 1
#5 0 0 1 1 1 1
#6 0 0 0 1 1 1
We may do this by row. An efficient option is using dapply
from collapse
. Loop over the rows, find the position index of 1s, get the sequence between the first and last, and replace
those elements to 1.
library(collapse)
dapply(df, MARGIN = 1, FUN = function(x)
replace(x, do.call(`:`, as.list(range(which(x == 1)))), 1 ))
-output
Year1 Year2 Year3 Year4 Year5 Year6
1 1 1 1 0 0 0
2 0 1 1 1 0 0
3 0 1 1 1 1 0
4 0 0 1 1 1 1
5 0 0 1 1 1 1
6 0 0 0 1 1 1
An option is also to get the row/column index with which
and arr.ind = TRUE
, then create the sequence, and use the row/column index to do the assignment which is vectorized
ind <- which(df ==1, arr.ind = TRUE)
m1 <- as.matrix(transform(stack(lapply(split(ind[,2], ind[,1]),
function(x) x[1]:x[length(x)]))[2:1], ind = as.integer(ind)))
df[m1] <- 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With