Total revenue generated from a route in R

Question

I have a dataset with origin ("from"), destination ("to") and price as below:

from    to  price
A       B   28109
A       D   2356
A       E   4216
B       A   445789
B       D   123
D       A   45674
D       B   1979

I want to sum the price considering the return route as well. for example, A - B consists of the following data:

from    to  price
  A     B   28109
  B     A   445789

Then, take the sum of the price (28109+445789). The output will be like this:

route   total_price
A - B   473898
A - D   48030
A - E   4216
B - D   2102

I was thinking to run a for loop but my data size is very large (800k rows). Any help will be highly appreciated. Thanks a lot in advance.

IceCreamToucan · Accepted Answer

You can do this by sorting the from-to pairs, then grouping on that sorted pair and summing.

Edit: See @JasonAizkalns' answer for tidyverse equivalent

library(data.table)
setDT(df)

df[, .(total_price = sum(price))
   , by = .(route = paste(pmin(from, to), '-', pmax(from, to)))]

#    route total_price
# 1: A - B      473898
# 2: A - D       48030
# 3: A - E        4216
# 4: B - D        2102

@Frank notes that this result hides the fact that route "A - E" is not complete, in the sense that there is no row of the original data with from == 'E' and to == 'A'. He's offered a good way of capturing that info (and more), and I've added some others below.

df[, .(total_price = sum(price), complete = .N > 1)
   , by = .(route = paste(pmin(from, to), '-', pmax(from, to)))]

#    route total_price complete
# 1: A - B      473898     TRUE
# 2: A - D       48030     TRUE
# 3: A - E        4216    FALSE
# 4: B - D        2102     TRUE

df[, .(total_price = sum(price), paths_counted = .(paste(from, '-', to)))
   , by = .(route = paste(pmin(from, to), '-', pmax(from, to)))]

#    route total_price paths_counted
# 1: A - B      473898   A - B,B - A
# 2: A - D       48030   A - D,D - A
# 3: A - E        4216         A - E
# 4: B - D        2102   B - D,D - B

Data used

df <- fread('
from    to  price
A       B   28109
A       D   2356
A       E   4216
B       A   445789
B       D   123
D       A   45674
D       B   1979')

JasonAizkalns · Answer

You could do a self-join and then things are pretty straightforward:

library(tidyverse)

df <- readr::read_table("
from    to  price
A       B   28109
A       D   2356
A       E   4216
B       A   445789
B       D   123
D       A   45674
D       B   1979
")

df %>%
  inner_join(df, by = c("from" = "to")) %>%
  filter(to == from.y) %>%
  mutate(
    route = paste(from, "-", to),
    total_price = price.x + price.y
  )
#> # A tibble: 6 x 7
#>   from  to    price.x from.y price.y route total_price
#>   <chr> <chr>   <dbl> <chr>    <dbl> <chr>       <dbl>
#> 1 A     B       28109 B       445789 A - B      473898
#> 2 A     D        2356 D        45674 A - D       48030
#> 3 B     A      445789 A        28109 B - A      473898
#> 4 B     D         123 D         1979 B - D        2102
#> 5 D     A       45674 A         2356 D - A       48030
#> 6 D     B        1979 B          123 D - B        2102

^{Created on 2019-03-20 by the reprex package (v0.2.1)}

Because I like @IceCreamToucan's answer better, here's the tidyverse equivalent:

df %>%
  group_by(route = paste(pmin(from, to), "-", pmax(from, to))) %>%
  summarise(total_price = sum(price))

tmfmnk · Answer

Also one tidyverse possibility:

df %>%
 nest(from, to) %>%
 mutate(route = unlist(map(data, function(x) paste(sort(x), collapse = "_")))) %>%
 group_by(route) %>%
 summarise(total_price = sum(price))

  route total_price
  <chr>       <int>
1 A_B        473898
2 A_D         48030
3 A_E          4216
4 B_D          2102

In this case, it, first, creates a list composed of values "from" and "to" variables. Second, it sorts the elements in the list and combines them together, separated by _. Finally, it groups by the combined elements and gets the sum.

Or involving a wide-to-long transformation:

df %>%
 rowid_to_column() %>%
 gather(var, val, -c(rowid, price)) %>%
 arrange(rowid, val) %>%
 group_by(rowid) %>%
 summarise(route = paste(val, collapse = "_"),
           price = first(price)) %>%
 group_by(route) %>%
 summarise(total_price = sum(price))

For this, it, first, performs a wide-to-long data transformation, excluding the row ID and "price". Second, it arranges the data according row ID and values contained in "from" and "to". Third, it groups by row ID, combines the elements together, separated by _. Finally, it groups by this variable and gets the sum.

Total revenue generated from a route in R

Tags:

r

permutation

user3642360

3 Answers

IceCreamToucan

JasonAizkalns

tmfmnk

Recent Activity

Donate For Us

Total revenue generated from a route in R

Tags:

r

permutation

user3642360

3 Answers

IceCreamToucan

JasonAizkalns

tmfmnk

Related questions

Recent Activity

Donate For Us