I am working with a spreadsheet of conflict events in the United States. Each row represents a single event and has geographic and temporal information included. Conflict events tend to occur in 'waves' (relatively tight temporal groupings). I have generated an identity variable for each of these waves and would like to create a variable that measures the geographic spread of these conflict events over the course of each wave.
I wanted to do this in Excel, but unfortunately I don't have the dynamic array formulae available. Before upgrading to a new version of Excel, I want to see whether it is possible in R. The data are already sorted by region, date, and wave.
The dataset is structured as follows:
Country Region Date Event Wave
------- ------- ------ ------- ------
USA Vermont 5/1/2017 Strike Wave 1
USA Vermont 5/2/2017 Strike Wave 1
USA New Hamp. 5/3/2017 Strike Wave 1
USA Vermont 5/3/2017 Strike Wave 1
USA Maine 5/4/2017 Strike Wave 1
USA Washingt. 8/16/2018 Riot Wave 2
USA Washingt. 8/18/2018 Riot Wave 2
USA Oregon 8/18/2018 Protest Wave 2
USA Californ. 8/19/2018 Riot Wave 2
USA Nevada 8/20/2018 Protest Wave 2
USA Idaho 8/20/2018 Riot Wave 2
I want to create a variable ("geo_disp") that records the number of regions that have experienced conflict within a given wave. Throughout the wave, I expect the number of regions to increase, and I would like the geo_disp variable to record this.
You will notice that when two events occur on the same day but in different locations, BOTH are recorded with the total number of regions.
Here is what I want the data to look like:
Country Region Date Event Wave geo_disp
------- ------- ------ ------- ------ --------
USA Vermont 5/1/2017 Strike Wave 1 1
USA Vermont 5/2/2017 Strike Wave 1 1
USA New Hamp. 5/3/2017 Strike Wave 1 2
USA Vermont 5/3/2017 Strike Wave 1 2
USA Maine 5/4/2017 Strike Wave 1 3
USA Washingt. 8/16/2018 Riot Wave 2 1
USA Washingt. 8/18/2018 Riot Wave 2 2
USA Oregon 8/18/2018 Protest Wave 2 2
USA Californ. 8/19/2018 Riot Wave 2 3
USA Nevada 8/20/2018 Protest Wave 2 5
USA Idaho 8/20/2018 Riot Wave 2 5
How can I create the geo_disp variable using R?
Thank you in advance - I greatly appreciate it.
A dplyr solution that keeps the whole data set.
library(dplyr)
df %>% group_by(Wave) %>% mutate(disp_geo = cumsum(!duplicated(Region)))
#> # A tibble: 11 x 6
#> # Groups: Wave [2]
#> Country Region Date Event Wave disp_geo
#> <chr> <chr> <chr> <chr> <chr> <int>
#> 1 USA Vermont 5/1/2017 Strike Wave 1 1
#> 2 USA Vermont 5/2/2017 Strike Wave 1 1
#> 3 USA New Hamp. 5/3/2017 Strike Wave 1 2
#> 4 USA Vermont 5/3/2017 Strike Wave 1 2
#> 5 USA Maine 5/4/2017 Strike Wave 1 3
#> 6 USA Washingt. 8/16/2018 Riot Wave 2 1
#> 7 USA Washingt. 8/18/2018 Riot Wave 2 1
#> 8 USA Oregon 8/18/2018 Protest Wave 2 2
#> 9 USA Californ. 8/19/2018 Riot Wave 2 3
#> 10 USA Nevada 8/20/2018 Protest Wave 2 4
#> 11 USA Idaho 8/20/2018 Riot Wave 2 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With