EDIT: although this question has been closed, it is helpful to note that the answers provided use a very different approach (with dplyr) than the original question asked in 2012(!). These new answers may be helpful for different users.
I have a datasets of sites with the min and max years when they were operational. But I want to expand this dataset so that each year the site was operational has a row.
For example:
set.seed(42)
df <- data.frame(
site = rep(LETTERS[1:10]),
minY = sample(1980:1990, 10),
maxY = sample(2000:2010, 10)
)
df
site minY maxY
1 A 1980 2007
2 B 1984 2006
3 C 1990 2003
4 D 1988 2000
5 E 1981 2004
6 F 1983 2005
7 G 1986 2008
8 H 1989 2001
9 I 1987 2009
10 J 1985 2010
So in my final dataset Site A would have a 28 rows (one for each year it was operating).
I've been trying to use the complete function, but I keep getting an error message:
complete(df,
nesting(site),
fill = list(value1 = minY, value2 = maxY))
Error in vec_is_list(replace) : object 'minY' not found
Maybe this works for you using dplyr
s summarize
.
library(dplyr)
df %>%
rowwise() %>%
summarize(site, year = seq(minY, maxY, 1))
# A tibble: 210 × 2
site year
<chr> <dbl>
1 A 1980
2 A 1981
3 A 1982
4 A 1983
5 A 1984
6 A 1985
7 A 1986
8 A 1987
9 A 1988
10 A 1989
# … with 200 more rows
You can use tidyr::uncount()
to create duplicates by a weight. In your case, just adding rows according to the difference in years can be done like this
df |>
uncount(weights = maxY - minY + 1)
If you wish to add a column of unique years, you could add it with dplyr::mutate()
df |>
uncount(weights = maxY - minY + 1) |>
group_by(site) |>
mutate(unique_year = seq.default(min(minY),max(maxY)))
This will result in a data.frame
with a number of rows according to the unique years between maxY
and minY
as well as a column with the unique years.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With