I've come across something a bit above my skill set. I'm working with IMF trade data that consists of data between country dyads. The IMF dataset consists of ' unordered duplicate' records in that each country individually reports trade data. However, due to a variety of timing, recording systems, regime type, etc., there are discrepancies between corresponding values. I'm trying to manipulate this data in two ways:
There are several discussions of identifying unordered duplicates here, here, here, and here but after a couple days of searching I have yet to see what I'm trying to do.
Here is an example of the raw data. In reality there are many more variables and several hundred thousand dyads:
reporter<-c('USA','GER','AFG','FRA','CHN')
partner<-c('AFG','CHN','USA','CAN','GER')
year<-c(2010,2010,2010,2009,2010)
import<-c(-1000,-2000,-2400,-1200,-2000)
export<-c(2500,2200,1200,2900,2100)
rep_econ1<-c(28,32,12,25,19)
imf<-data.table(reporter,partner,year,import,export,rep_econ1)
imf
reporter partner year import export rep_econ1
1: USA AFG 2010 -1000 2500 28
2: GER CHN 2010 -2000 2200 32
3: AFG USA 2010 -2400 1200 12
4: FRA CAN 2009 -1200 2900 25
5: CHN GER 2010 -2000 2100 19
The additional wrinkle is that import and export are inverses of each other between the dyads, so they need to be matched and meaned with an absolute value.
For objective 1, the resulting data.table is:
Mean
reporter partner year import export rep_econ1
USA AFG 2010 -1100 2450 28
GER CHN 2010 -2050 2100 32
AFG USA 2010 -2450 1100 12
FRA CAN 2009 -1200 2900 25
CHN GER 2010 -2100 2050 19
For objective 2:
Conditionally Assign on Higher Economic Indicator (rep_econ1)
reporter partner year import export rep_econ1
USA AFG 2010 -1000 2500 28
GER CHN 2010 -2000 2200 32
AFG USA 2010 -2500 1000 12
FRA CAN 2009 -1200 2900 25
CHN GER 2010 -2200 2000 19
It's possible not all dyads are represented twice so I included a solo record. I prefer data.table but I'll go with anything that leads me down the right path.
Thank you for your time.
Pre - Processing:
library(data.table)
# get G = reporter/partner group and N = number of rows for each group
# Thanks @eddi for simplifying
imf[, G := .GRP, by = .(year, pmin(reporter, partner), pmax(reporter, partner))]
imf[, N := .N, G]
Option 1 (means)
# for groups with 2 rows, average imports and exports
imf[N == 2
, `:=`(import = (import - rev(export))/2
, export = (export - rev(import))/2)
, by = G]
imf
# reporter partner year import export rep_econ1 G N
# 1: USA AFG 2010 -1100 2450 28 1 2
# 2: GER CHN 2010 -2050 2100 32 2 2
# 3: AFG USA 2010 -2450 1100 12 1 2
# 4: FRA CAN 2009 -1200 2900 25 3 1
# 5: CHN GER 2010 -2100 2050 19 2 2
Option 2 (highest economic indicator)
# for groups with 2 rows, choose imports and exports based on highest rep_econ1
imf[N == 2
, c('import', 'export') := {
o <- order(-rep_econ1)
import <- cbind(import, -export)[o[1], o]
.(import, export = -rev(import))}
, by = G]
imf
# reporter partner year import export rep_econ1 G N
# 1: USA AFG 2010 -1000 2500 28 1 2
# 2: GER CHN 2010 -2000 2200 32 2 2
# 3: AFG USA 2010 -2500 1000 12 1 2
# 4: FRA CAN 2009 -1200 2900 25 3 1
# 5: CHN GER 2010 -2200 2000 19 2 2
Option 2 explanation: You need to select the row with the highest economic indicator (i.e. row order(-rep_econ1)[1]) and use that for imports, but if the second row is the "trusted" one, it needs to be reversed. Otherwise you'd have the countries switched, since the second reporter's imports (now the first element of cbind(import, -export)[o[1],]) would be assigned as the first reporter's imports (because it's the first element).
Edit:
If imports and exports are both positive in the input data and need to be positive in the output data, the two calculations above can be modified as
imf[N == 2
, `:=`(import = (import + rev(export))/2
, export = (export + rev(import))/2)
, by = G]
And
imf[N == 2
, c('import', 'export') := {
o <- order(-rep_econ1)
import <- cbind(import, export)[o[1], o]
.(import, export = rev(import))}
, by = G]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With