Let's say I have the following range in Excel named MyRange
:
This isn't a table by any means, it's more a collection of Variant values entered into cells. Excel makes it easy to sum these values doing =SUM(B3:D6)
which gives 25
. Let's not go into the details of type checking or anything like that and just figure that sum will easily skip values that don't make sense.
If we were translating this concept into SQL, what would be the most natural way to do this? The few approaches that came to mind are (ignore type errors for now):
MyRange
returns an array of values:
-- myRangeAsList = [1,1,1,2, ...]
SELECT SUM(elem) FROM UNNEST(myRangeAsList) AS r (elem);
MyRange
returns a table-valued function of a single column (basically the opposite of a list):
-- myRangeAsCol = (SELECT 1 UNION ALL SELECT 1 UNION ALL ...
SELECT SUM(elem) FROM myRangeAsCol as r (elem);
Or, perhaps more 'correctly', return a 3-columned table such as:
-- myRangeAsTable = (SELECT 1,1,1 UNION ALL SELECT 2,'other',2 UNION ALL ...
SELECT SUM(a+b+c) FROM SELECT a FROM myRangeAsTable (a,b,c)
Unfortunately, I think this makes things the most difficult to work with, as we now have to combine an unknown number of columns.
Perhaps returning a single column is the easiest of the above to work with, but even that takes a very simple concept -- SUM(myRange)
and converts into something that is anything but that: SELECT SUM(elem) FROM myRangeAsCol as r (elem)
.
Perhaps this could also just be rewritten to a function for convenience, for example:
Just possible direction to think
create temp function extract_values (input string)
returns array<string> language js as """
return Object.values(JSON.parse(input));
""";
with myrangeastable as (
select '1' a, '1' b, '1' c union all
select '2', 'other', '2' union all
select 'true', '3', '3' union all
select '4', '4', '4'
)
select sum(safe_cast(value as float64)) range_sum
from myrangeastable t,
unnest(extract_values(to_json_string(t))) value
with output
Note: no columns explicitly used so should work for any sized range w/o any changes in code
Depends on specific use case, I think above can be wrapped into something more friendly for someone who knows excel to do
I'll try to pose, atomic, pure SQL principles that start with obvious items and goes to the more complicated ones. The intention is, all items can be used in any RDBS:
Items 2 and 3 in Excel and in a DB:
/*
Item 2: Table
the range in the excel is modeled as the below test_table
Item 3: Columns
id keeps the excel row number
b, c, d are the corresponding b, c, d columns of the excel
*/
create table test_table
(
id integer,
b varchar(20),
c varchar(20),
d varchar(20)
);
-- Item 3: Adding the rows in the DB
insert into test_table values (3 /* same as excel row number */ , '1', '1', '1');
insert into test_table values (4 /* same as excel row number */ , '2', 'other', '2');
insert into test_table values (5 /* same as excel row number */ , 'TRUE', '3', '3');
insert into test_table values (6 /* same as excel row number */ , '4', '4', '4');
where
condition. where
condition goes through all rows (or indexes for the sake of speed but this is beyond this answer's scope), and filters out which does not satisfy the test boolean logic in the condition. (So for example where 1 = 1
is brings all rows because the condition is always true
for all rows.sum(column_a + column_b)
(row by row summation) or sum(a) + sum(b)
(column by column summation). If we assume all the data are not null, then both gives the same output.Items 4 and 5 in Excel and in a DB:
select sum(b + c + d) -- Item 5, first option: We sum row by row
from test_table
where id between 3 and 6; -- Item 4: We simple get all rows, because for all rows above the id are between 3 and 6, if we had another row with 7, it would be filtered out
+----------------+
| sum(b + c + d) |
+----------------+
| 25 |
+----------------+
select sum(b) + sum(c) + sum(d) -- Item 5, second option: We sum column by column
from test_table
where id between 3 and 6; -- Item 4: We simple get all rows, because for all rows above the id are between 3 and 6, if we had another row with 7, it would be filtered out
+--------------------------+
| sum(b) + sum(c) + sum(d) |
+--------------------------+
| 25 |
+--------------------------+
group by
mechanics. The group by
basically groups a table according to its condition and each group behaves like a sub-table. For example if you say group by column_a
for a table, the values are grouped according to the values of the table.having
clauses, which acts same as where
but works over the columns in group by
or the functions over those columns.Items 6 and 7 in Excel and in a DB:
-- Item 6: We can have group by clause to simulate a pivot table
insert into test_table values (7 /* same as excel row */ , '4', '2', '2');
select b, sum(d), min(d), max(d), avg(d)
from test_table
where id between 3 and 7
group by b;
+------+--------+--------+--------+--------+
| b | sum(d) | min(d) | max(d) | avg(d) |
+------+--------+--------+--------+--------+
| 1 | 1 | 1 | 1 | 1 |
| 2 | 2 | 2 | 2 | 2 |
| TRUE | 3 | 3 | 3 | 3 |
| 4 | 6 | 2 | 4 | 3 |
+------+--------+--------+--------+--------+
Beyond this point following are the details which are not directly related with the questions purpose:
The excel file and the SQL used in this answer can be found in this github repo: https://github.com/MehmetKaplan/stackoverflow-72135212/
PS: I used SQL for more than 2 decades and then reduced using it and started to use Excel much frequently because of job changes. Each time I use Excel I still think of the DBs and "relational algebra" which is the mathematical foundation of the RDBMSs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With