I have the following data frame:
╔══════╦═════════╗
║ Code ║ Airline ║
╠══════╬═════════╣
║ 1 ║ AF ║
║ 1 ║ KL ║
║ 8 ║ AR ║
║ 8 ║ AZ ║
║ 8 ║ DL ║
╚══════╩═════════╝
dat <- structure(list(Code = c(1L, 1L, 8L, 8L, 8L), Airline = structure(c(1L,
5L, 2L, 3L, 4L), .Label = c("AF ", "AR ", "AZ ", "DL", "KL "
), class = "factor")), .Names = c("Code", "Airline"), class = "data.frame", row.names = c(NA,
-5L))
My goal is for each airline to find all shared codes, i.e. the codes used by one or more other airlines. So the output would be
+--------------------+
| Airline SharedWith |
+--------------------+
| AF "KL" |
| KL "AF" |
| AR "AZ","DL" |
+--------------------+
the pseudocode is any imperative language would be
for each code
lookup all rows in the table where the value = code
Since R is not that much list oriented, what would be the best way to achieve the expected output?
Several options using the data.table
package:
1) Using strsplit
, paste
& operate by row:
library(data.table)
setDT(dat)[, Airline := trimws(Airline) # this step is needed to remove the leading and trailing whitespaces
][, sharedwith := paste(Airline, collapse = ','), Code
][, sharedwith := paste(unlist(strsplit(sharedwith,','))[!unlist(strsplit(sharedwith,',')) %in% Airline],
collapse = ','), 1:nrow(dat)]
which gives:
> dat
Code Airline sharedwith
1: 1 AF KL
2: 1 KL AF
3: 8 AR AZ,DL
4: 8 AZ AR,DL
5: 8 DL AR,AZ
2) Using strsplit
& paste
with mapply
instead of by = 1:nrow(dat)
:
setDT(dat)[, Airline := trimws(Airline)
][, sharedwith := paste(Airline, collapse = ','), Code
][, sharedwith := mapply(function(s,a) paste(unlist(strsplit(s,','))[!unlist(strsplit(s,',')) %in% a],
collapse = ','),
sharedwith, Airline)][]
which will give you the same result.
3) Or by using the CJ
function with paste
(inspired by the expand.grid
solution of @zx8754):
library(data.table)
setDT(dat)[, Airline := trimws(Airline)
][, CJ(air=Airline, Airline, unique=TRUE)[air!=V2][, .(shared=paste(V2,collapse=',')), air],
Code]
which gives:
Code air shared
1: 1 AF KL
2: 1 KL AF
3: 8 AR AZ,DL
4: 8 AZ AR,DL
5: 8 DL AR,AZ
A solution with dplyr
& tidyr
to get the desired solution (inspired by @jaimedash):
library(dplyr)
library(tidyr)
dat <- dat %>% mutate(Airline = trimws(as.character(Airline)))
dat %>%
mutate(SharedWith = Airline) %>%
group_by(Code) %>%
nest(-Code, -Airline, .key = SharedWith) %>%
left_join(dat, ., by = 'Code') %>%
unnest() %>%
filter(Airline != SharedWith) %>%
group_by(Code, Airline) %>%
summarise(SharedWith = toString(SharedWith))
which gives:
Code Airline SharedWith
(int) (chr) (chr)
1 1 AF KL
2 1 KL AF
3 8 AR AZ, DL
4 8 AZ AR, DL
5 8 DL AR, AZ
An an igraph
approach
library(igraph)
g <- graph_from_data_frame(dat)
# Find neighbours for select nodes
ne <- setNames(ego(g,2, nodes=as.character(dat$Airline), mindist=2), dat$Airline)
ne
#$`AF `
#+ 1/7 vertex, named:
#[1] KL
#$`KL `
#+ 1/7 vertex, named:
#[1] AF
---
---
# Get final format
data.frame(Airline=names(ne),
Shared=sapply(ne, function(x)
paste(V(g)$name[x], collapse=",")))
# Airline Shared
# 1 AF KL
# 2 KL AF
# 3 AR AZ,DL
# 4 AZ AR,DL
# 5 DL AR,AZ
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With