Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Arel causing infinite loop on aggregation

I have trouble with using Arel to aggregate 2 columns in the same query. When I run this, the whole server freezes for a minute, before the rails dev-server crashes. I suspect an infinite loop :).

Maybe I have misunderstood the whole concept of Arel, and I would be grateful if anybody could have a look at it.

The expected result of this query is something like this: [{:user_id => 1, :sum_account_charges => 300, :sum_paid_debts => 1000},...]

a_account_charges = Table(:account_charges)
a_paid_debts = Table(:paid_debts)
a_participants = Table(:expense_accounts_users)

account_charge_sum = a_account_charges
  .where(a_account_charges[:expense_account_id].eq(id))
  .group(a_account_charges[:user_id])
  .project(a_account_charges[:user_id], a_account_charges[:cost].sum)

paid_debts_sum = a_paid_debts
 .where(a_paid_debts[:expense_account_id].eq(id))
 .group(a_paid_debts[:from_user_id])
 .project(a_paid_debts[:from_user_id], a_paid_debts[:cost].sum)

charges = a_participants
 .where(a_participants[:expense_account_id].eq(id))
 .join(account_charge_sum)
 .on(a_participants[:user_id].eq(account_charge_sum[:user_id]))
 .join(paid_debts_sum)
 .on(a_participants[:user_id].eq(paid_debts_sum[:from_user_id]))
like image 948
finpingvin Avatar asked Oct 13 '10 19:10

finpingvin


1 Answers

I'm new to arel, but after banging on this for several days and really digging, I don't think it can be done. Here's an outline of what I've done, if anyone has any additional insight it would be welcome.

First, these scripts will create the test tables and populate them with test data. I have set up 9 expense_account_users, each with a different set of charges/paid_debts, as follows: 1 charge/1 payment, 2 charges/2 payments, 2 charges/1 payment, 2 charges/0 payments, 1 charge/2 payments, 0 charges/2 payments, 1 charge/0 payments, 0 charges/1 payment, 0 charges, 0 payments.

CREATE TABLE IF NOT EXISTS `expense_accounts_users` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `expense_account_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=10 ;

INSERT INTO `expense_accounts_users` (`id`, `expense_account_id`) VALUES (1, 1), (2, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1), (8, 1), (9, 1);

CREATE TABLE IF NOT EXISTS `account_charges` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `expense_account_id` int(11) DEFAULT NULL,
  `user_id` int(11) DEFAULT NULL,
  `cost` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=10 ;

INSERT INTO `account_charges` (`id`, `expense_account_id`, `user_id`, `cost`) VALUES (1, 1, 1, 1), (2, 1, 2, 1), (3, 1, 2, 2), (4, 1, 3, 1), (5, 1, 3, 2), (6, 1, 4, 1), (7, 1, 5, 1), (8, 1, 5, 2), (9, 1, 7, 1);

CREATE TABLE IF NOT EXISTS `paid_debts` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `expense_account_id` int(11) DEFAULT NULL,
  `user_id` int(11) DEFAULT NULL,
  `cost` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=10 ;

INSERT INTO `paid_debts` (`id`, `expense_account_id`, `user_id`, `cost`) VALUES (1, 1, 1, 1), (2, 1, 2, 1), (3, 1, 2, 2), (4, 1, 3, 1), (5, 1, 4, 1), (6, 1, 4, 2), (7, 1, 6, 1), (8, 1, 6, 2), (9, 1, 8, 1);

Ultimately, to get the data that you are after in one fell swoop, this is the SQL statement you'd use:

SELECT user_charges.user_id,
  user_charges.sum_cost,
  COALESCE(SUM(paid_debts.cost), 0) AS 'sum_paid'
FROM (
  SELECT expense_accounts_users.id AS 'user_id',
  COALESCE(sum(account_charges.cost), 0) AS 'sum_cost'
  FROM expense_accounts_users
  LEFT OUTER JOIN account_charges on expense_accounts_users.id = account_charges.user_id
  GROUP BY expense_accounts_users.id)
AS user_charges
LEFT OUTER JOIN paid_debts ON user_charges.user_id = paid_debts.user_id
GROUP BY user_charges.user_id

You have to do a LEFT OUTER JOIN between users and charges first so that you get a row for every user, then you have to LEFT OUTER JOIN the result to debts in order to avoid multiplying your results with the two joins inside the same construct.

(note the use of COALESCE to convert NULL values from the LEFT OUTER JOINs to zeroes - a convenience item, perhaps)

The result of this statement is this:

user_id   sum_cost  sum_paid
1         1         1
2         3         3
3         3         1
4         1         3
5         3         0
6         0         3
7         1         0
8         0         1
9         0         0

After many attempts, I found that this arel code came the closest to what we are after:

c = Arel::Table.new(:account_charges)
d = Arel::Table.new(:paid_debts)
p = Arel::Table.new(:expense_accounts_users)
user_charges = p
 .where(p[:expense_account_id].eq(1))
 .join(c, Arel::Nodes::OuterJoin)
 .on(p[:id].eq(c[:user_id]))
 .project(p[:id], c[:cost].sum.as('sum_cost'))
 .group(p[:id])
charges = user_charges
 .join(d, Arel::Nodes::OuterJoin)
 .on(p[:id].eq(d[:user_id]))
 .project(d[:cost].sum.as('sum_paid'))

Essentially, I am joining users to charges in the first construct with a LEFT OUTER JOIN, then trying to take the result of that and LEFT OUTER JOIN it back to debts. This arel code produces the following SQL statement:

SELECT `expense_accounts_users`.`id`,
  SUM(`account_charges`.`cost`) AS sum_cost,
  SUM(`paid_debts`.`cost`) AS sum_paid
FROM `expense_accounts_users`
LEFT OUTER JOIN `account_charges` ON `expense_accounts_users`.`id` = `account_charges`.`user_id`
LEFT OUTER JOIN `paid_debts` ON `expense_accounts_users`.`id` = `paid_debts`.`user_id`
WHERE `expense_accounts_users`.`expense_account_id` = 1
GROUP BY `expense_accounts_users`.`id`

Which, when run, produces this output:

id  sum_cost  sum_paid
1   1         1
2   6         6
3   3         2
4   2         3
5   3         NULL
6   NULL      3
7   1         NULL
8   NULL      1
9   NULL      NULL

Very close, but not quite. First, the lack of COALESCE gives us NULL values instead of zeroes - I'm not sure how to effect the COALESCE function call from within arel.

More importantly, though, combining the LEFT OUTER JOINs into one statement without the internal subselect is resulting in the sum_paid totals getting multiplied up in examples 2, 3 and 4 - any time there is more than one of either charge or payment and at least one of the other.

Based on some online readings, I had hoped that changing the arel slightly would solve the problem:

charges = user_charges
 .join(d, Arel::Nodes::OuterJoin)
 .on(user_charges[:id].eq(d[:user_id]))
 .project(d[:cost].sum.as('sum_paid'))

But any time I used user_charges[] in the second arel construct, I got an undefined method error for SelectManager#[]. This may be a bug, or may be right - I really can't tell.

I just don't see that arel has a way to make use of the SQL from the first construct as a queryable object in the second construct with the necessary subquery alias, as is needed to make this happen in one SQL statement.

like image 134
Yardboy Avatar answered Sep 28 '22 21:09

Yardboy