Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping serial posts in a user feed

I have a user feed of image posts. Each user can post single images, however, he can repeat the action often. Say, upload several images within an hour.

How do I effectively design the database table so when a user posts multiple images (one by one) within one hour — I can easily group those serial posts together, eigher on INSERT or on SELECT?


Don't suggest multi-upload form. That's not the case: I've just described the task in more common terms :)

like image 966
kolypto Avatar asked Feb 21 '23 13:02

kolypto


2 Answers

Can you store a timestamp with each post, and then select each item whose timestamp is less than some threshold from the next?

Another idea would be to store both a timestamp and a "group number" with each post. Before you store a post, execute a SELECT to look for posts that were submitted within the last n minutes. If you find one, use the same group number for the new post. If you don't then increment the group number for the new post. Then you can select by group number to find the items you want.

like image 174
Adam Liss Avatar answered Feb 23 '23 01:02

Adam Liss


I suppose the data model would look similar this:

enter image description here

Just be careful to ensure the temporal difference between posts is greater than the resolution of the TIMESTAMP (or be prepared to handle PK violations gracefully).

In a DBMS that supports analytic functions, you can then fairly easily group posts that are temporally close together. For example, Oracle query to group posts (for the given user) that fall within an hour of each other, would look like this:

SELECT T.*, SUM(DIFF) OVER (ORDER BY TIMESTAMP) GROUPING
FROM (
    SELECT
        IMAGE.*,
        CASE
            WHEN TIMESTAMP <= LAG(TIMESTAMP) OVER (ORDER BY TIMESTAMP)
                + INTERVAL '1' HOUR
            THEN 0
            ELSE 1
            END DIFF
    FROM IMAGE
    WHERE USER_ID = :user_id
) T;

The resulting GROUPING field will identify individual groups of rows whose TIMESTAMP is "close enough". This query is also quite efficient - it's just a range scan on the PK index. You can play with it in the SQL Fiddle.

Unfortunately, MySQL doesn't support analytic functions, but you should have no trouble doing essentially the same at the application level. Just SELECT ... ORDER BY TIMESTAMP, linearly traverse the results and see what's the difference between the current and the previous row.

like image 22
Branko Dimitrijevic Avatar answered Feb 23 '23 02:02

Branko Dimitrijevic