Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL return rows in a "round-robin" order

Tags:

sql

postgresql

I have a bunch of URLs stored in a table waiting to be scraped by a script. However, many of those URLs are from the same site. I would like to return those URLs in a "site-friendly" order (that is, try to avoid two URLs from the same site in a row) so I won't be accidentally blocked by making too many http requests in a short time.

The database layout is something like this:

create table urls (
    site varchar,       -- holds e.g. www.example.com or stockoverflow.com
    url varchar unique
);

Example result:

SELECT url FROM urls ORDER BY mysterious_round_robin_function(site);

http://www.example.com/some/file
http://stackoverflow.com/questions/ask
http://use.perl.org/
http://www.example.com/some/other/file
http://stackoverflow.com/tags

I thought of something like "ORDER BY site <> @last_site DESC" but I have no idea how to go about writing something like that.

like image 937
hhaamu Avatar asked Jul 21 '09 17:07

hhaamu


3 Answers

See this article in my blog for more detailed explanations on how it works:

  • PostgreSQL: round-robin order

With new PostgreSQL 8.4:

SELECT  *
FROM    (
        SELECT  site, url, ROW_NUMBER() OVER (PARTITION BY site ORDER BY url) AS rn
        FROM    urls
        )
ORDER BY
        rn, site

With elder versions:

SELECT  site,
        (
        SELECT  url
        FROM    urls ui
        WHERE   ui.site = sites.site
        ORDER BY
                url
        OFFSET  total
        LIMIT   1
        ) AS url
FROM    ( 
        SELECT  site, generate_series(0, cnt - 1) AS total
        FROM    (
                SELECT  site, COUNT(*) AS cnt
                FROM    urls
                GROUP BY
                        site
                ) s
        ) sites
ORDER BY
        total, site

, though it can be less efficient.

like image 118
Quassnoi Avatar answered Nov 14 '22 11:11

Quassnoi


I think you're overcomplicating this. Why not just use

ORDER BY NewID()

like image 38
Keith Adler Avatar answered Nov 14 '22 11:11

Keith Adler


You are asking for round-robin, but I think a simple

SELECT site, url FROM urls ORDER BY RANDOM()

will do the trick. It should work even if urls from the same site are clustered in db.

like image 42
Wojciech Bederski Avatar answered Nov 14 '22 10:11

Wojciech Bederski