Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient MySQL database design for a Tinder like app

I'm creating an app like Tinder. Which a user can swipe right or like and swipe left or dislike another users. The problem is about storing operations of users. A table is needed for users operations like below

Person 1.   |   Person 2.    |     op
__________________________________
000001.          000007.          Dislike
000001.          000011.          Like
000001.          000053.          Dislike
000001.          000173.          Dislike

It stores operations and also uses for not showing a user more times. It's ok till now.

But the problem is if just 1000 users swipe another 1000 users that table will have 1M of rows. And if 100,000 users do that...It goes to 100M of rows! Which is very huge.

Do u guys have any idea for a structure designe which not to grow so large?

Thank you.

like image 504
Robert junior Avatar asked Mar 11 '19 13:03

Robert junior


2 Answers

There are a couple of things to consider.

Firstly, the size of a table is not hugely interesting unless you know the types of query you need to run. As others have said, tables with hundreds of millions of rows are nothing to be afraid of, and if you're querying on indexable fields, you can probably scale to billions of rows without going to exotic solutions just by buying bigger and better hardware. So, a solution where 90% of your queries are

select * from users where user_id not in (select interacted_user_id from interactions where interacting_user_id = $current_user) limit 10

My guess is this will scale to hundreds of millions of rows on your laptop, and billions of rows on a decent server. My strong recommendation is to use a simple, relational solution without partitioning or other exotic solutions until you've scaled to the point where that no longer works, and you have tuned your queries and upgraded your hardware as far as possible. This is much cheaper/easier than any other solution.

The bigger challenge will be the geospatial aspect - presumably, you want to order your results based on distance from the current user.

One way in which you might partition your data is to gather "interactions" by region. This requires some thought - you probably don't want "hard" boundaries, but rather have overlapping geographies. Each spot on the map might have a few overlapping "regions", each with its own table. The more users you have in a region, the smaller you make the overlapping circles - Manhattan might have 3 regions, Greenland might have just 1. Your query then looks at the tables for each overlapping region, and unions the users who haven't previously interacted with the current user.

like image 70
Neville Kuyt Avatar answered Sep 29 '22 12:09

Neville Kuyt


If the person 1 disliked the person 2 there is no need to show the person 1 to the person 2. Even if you show him, they will never match. Therefore your calculation 1K x 1K = 1M is a bit overestimated.

However, if you still want to have sets of likes/dislikes for both users, you might consider this terrible idea of "compressing" rows.

Imagine you have a sequence like this:

| Person 1 | Person 2 |  Op       |
| -------- | -------- | --------- |
| 0001     | 1010     |  Dislike  |
| 0001     | 1011     |  Dislike  |
| 0001     | 1012     |  Dislike  |
| 0001     | 1013     |  Dislike  |
| 0001     | 1015     |  Like     |
| 0001     | 1017     |  Dislike  |
| 0001     | 1018     |  Dislike  |
| 0001     | 1019     |  Dislike  |
| 0001     | 1021     |  Like     |

If you have ids following each other you might show them as

| Person 1 | Person 2 |  Op       | N    |
| -------- | -------- | --------- | ---- |
| 0001     | 1010     |  Dislike  | 3    |
| 0001     | 1015     |  Like     | 0    |
| 0001     | 1017     |  Dislike  | 2    |
| 0001     | 1021     |  Like     | 0    |

where N is max id in a sequence (ex. 1010 + 3 = 1013). If you define N as an unsigned tinyint then the max possible size of the sequence can be 255, that means, in theory, 255 sequential dislikes/likes can be saved as 1 record.

And the query would be something like this (imagine you are looking for id 1013):

SELECT a.* 
FROM (
    SELECT *
    FROM `table`
    WHERE person_1 = 0001
      AND person_2 >= (1013 - 255) -- 255 is a max size of a sequense 
      AND person_2 <= 1013
) a
WHERE a.person_2 <= 1013 AND a.person_2 + N >= 1013

The sub-select will limit the range of possible records then the main select will match the record if it exists. In this case, it will be

| Person 1 | Person 2 |  Op       | N    |
| -------- | -------- | --------- | ---- |
| 0001     | 1010     |  Dislike  | 3    |

But, personally, I would go with this and prefer your current solution because of its simplicity.

OR as another variant, you might compress the table this way

| Person 1 | Person 2 | Max Person 2 |  Op       |
| -------- | -------- | ------------ | --------- |
| 0001     | 1010     | 1013         |  Dislike  |
| 0001     | 1015     | 1015         |  Like     |
| 0001     | 1017     | 1019         |  Dislike  |
| 0001     | 1021     | 1021         |  Like     |
like image 26
Sergey Stativa Avatar answered Sep 29 '22 12:09

Sergey Stativa