Given that I've got a table with the following, very simple content:
# select * from messages;
id | verbosity
----+-----------
1 | 20
2 | 20
3 | 20
4 | 30
5 | 100
(5 rows)
I would like to select N messages, which sum of verbosity is lower than Y (for testing purposes let's say it should be 70, then correct results will be messages with id 1,2,3). It's really important to me, that solution should be database independent (it should work at least on Postgres and SQLite).
I was trying with something like:
SELECT * FROM messages GROUP BY id HAVING SUM(verbosity) < 70;
However it doesn't seem to work as expected, because it doesn't actually sum all values from verbosity column.
I would be very grateful for any hints/help.
In SQL, we use the SUM() function to add the numeric values in a column. It is an aggregate function in SQL. The aggregate function is used in conjunction with the WHERE clause to extract more information from the data.
To filter records using the aggregate function (the function SUM from earlier), use the HAVING clause. To calculate the sum of values for each group of rows, use the aggregation SUM function.
If you need to add a group of numbers in your table you can use the SUM function in SQL. This is the basic syntax: SELECT SUM(column_name) FROM table_name; The SELECT statement in SQL tells the computer to get data from the table.
SELECT TOP 1 Means Selecting the very 1st record in the result set. SELECT 1 Means return 1 as the result set.
SELECT m.id, sum(m1.verbosity) AS total
FROM messages m
JOIN messages m1 ON m1.id <= m.id
WHERE m.verbosity < 70 -- optional, to avoid pointless evaluation
GROUP BY m.id
HAVING SUM(m1.verbosity) < 70
ORDER BY total DESC
LIMIT 1;
This assumes a unique, ascending id
like you have in your example.
In modern Postgres - or generally with modern standard SQL (but not in SQLite):
WITH cte AS (
SELECT *, sum(verbosity) OVER (ORDER BY id) AS total
FROM messages
)
SELECT *
FROM cte
WHERE total < 70
ORDER BY id;
Should be faster for big tables where you only retrieve a small set.
WITH RECURSIVE cte AS (
( -- parentheses required
SELECT id, verbosity, verbosity AS total
FROM messages
ORDER BY id
LIMIT 1
)
UNION ALL
SELECT c1.id, c1.verbosity, c.total + c1.verbosity
FROM cte c
JOIN LATERAL (
SELECT *
FROM messages
WHERE id > c.id
ORDER BY id
LIMIT 1
) c1 ON c1.verbosity < 70 - c.total
WHERE c.total < 70
)
SELECT *
FROM cte
ORDER BY id;
All standard SQL, except for LIMIT
.
Strictly speaking, there is no such thing as "database-independent". There are various SQL-standards, but no RDBMS complies completely. LIMIT
works for PostgreSQL and SQLite (and some others). Use TOP 1
for SQL Server, rownum
for Oracle. Here's a comprehensive list on Wikipedia.
The SQL:2008 standard would be:
...
FETCH FIRST 1 ROWS ONLY
... which PostgreSQL supports - but hardly any other RDBMS.
The pure alternative that works with more systems would be to wrap it in a subquery and
SELECT max(total) FROM <subquery>
But that is slow and unwieldy.
db<>fiddle here
Old sqlfiddle
This will work...
select *
from messages
where id<=
(
select MAX(id) from
(
select m2.id, SUM(m1.verbosity) sv
from messages m1
inner join messages m2 on m1.id <=m2.id
group by m2.id
) v
where sv<70
)
However, you should understand that SQL is designed as a set based language, rather than an iterative one, so it designed to treat data as a set, rather than on a row by row basis.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With