Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use dplyr complete() to create new variable based on min/max values [duplicate]

Tags:

r

dplyr

EDIT: although this question has been closed, it is helpful to note that the answers provided use a very different approach (with dplyr) than the original question asked in 2012(!). These new answers may be helpful for different users.

I have a datasets of sites with the min and max years when they were operational. But I want to expand this dataset so that each year the site was operational has a row.

For example:

set.seed(42)
df <- data.frame(
  site = rep(LETTERS[1:10]),
  minY = sample(1980:1990, 10),
  maxY = sample(2000:2010, 10)
)
df
   site minY maxY
1     A 1980 2007
2     B 1984 2006
3     C 1990 2003
4     D 1988 2000
5     E 1981 2004
6     F 1983 2005
7     G 1986 2008
8     H 1989 2001
9     I 1987 2009
10    J 1985 2010

So in my final dataset Site A would have a 28 rows (one for each year it was operating).

I've been trying to use the complete function, but I keep getting an error message:

complete(df,
         nesting(site),
         fill = list(value1 = minY, value2 = maxY))
Error in vec_is_list(replace) : object 'minY' not found
like image 545
tnt Avatar asked Oct 17 '25 05:10

tnt


2 Answers

Maybe this works for you using dplyrs summarize.

library(dplyr)

df %>% 
  rowwise() %>% 
  summarize(site, year = seq(minY, maxY, 1))
# A tibble: 210 × 2
   site   year
   <chr> <dbl>
 1 A      1980
 2 A      1981
 3 A      1982
 4 A      1983
 5 A      1984
 6 A      1985
 7 A      1986
 8 A      1987
 9 A      1988
10 A      1989
# … with 200 more rows
like image 154
Andre Wildberg Avatar answered Oct 18 '25 23:10

Andre Wildberg


You can use tidyr::uncount() to create duplicates by a weight. In your case, just adding rows according to the difference in years can be done like this

df |>
  uncount(weights = maxY - minY + 1)

If you wish to add a column of unique years, you could add it with dplyr::mutate()

df |>
  uncount(weights = maxY - minY + 1) |>
  group_by(site) |>
  mutate(unique_year = seq.default(min(minY),max(maxY)))

This will result in a data.frame with a number of rows according to the unique years between maxY and minY as well as a column with the unique years.

like image 37
FactOREO Avatar answered Oct 18 '25 21:10

FactOREO



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!