Mysql json based trending tags implementation

Tags:

I am trying to identifying the trending tags (based on maximum hits) on time series using mysql json feature. Below is my table

CREATE TABLE TAG_COUNTER (
    account       varchar(36) NOT NULL,
    time_id       INT NOT NULL,
    counters      JSON,
    PRIMARY KEY   (account, time_id)
)

In every web api request, i will be getting the multiple different tags per account, and based on number of tags, i will prepare the INSERT ON DUPLICATE KEY UPDATE query. Below example is showing insertion with two tags.

INSERT INTO `TAG_COUNTER`
  (`account`, `time_id`, `counters`)
VALUES
  ('google', '2018061023', '{"tag1": 1, "tag2": 1}')
ON DUPLICATE KEY UPDATE `counters` =
  JSON_SET(`counters`,
           '$."tag1"',
           IFNULL(JSON_EXTRACT(`counters`,
                        '$."tag1"'), 0) + 1,
           '$."tag2"',
           IFNULL(JSON_EXTRACT(`counters`,
                        '$."tag2"'), 0) + 1
  );

time_id is yyyyMMddhh, and it is hourly aggregation on each row.

Now my problem is retrival of treding tags. Below query will give me aggregation for tag1, but we will not be knowing the tags before making this query.

SELECT
SUBSTRING(time_id, 1, 6) AS month,
SUM(counters->>'$.tag1')
FROM TAG_COUNTER
WHERE counters->>'$.tag1' > 0
GROUP BY month;

So i need generic group by query along with order by to get the trending tags for the time hourly/daily/monthly.

The sample of output expected is

Time(hour/day/month)  Tag_name  Tag_count_value(total hits)

When i have searched the web, every where it is mentioned like below {"tag_name": "tag1", "tag_count": 1} instead of direct {"tag1" : 1} and they were using tag_name in the group by.

Q1) So is it always mandatory to have common known json key to perform group by ..?

Q2) If i have to go with this way, then what is the change in my INSERT ON DUPLICATE KEY UPDATE query for this new json label/value struture? Since the counter has to be created when it is not existing and should increment by one when it is existing.

Q3) do i have to maintain array of objects

[
 {"tag_name": "tag1", "tag_count": 2},
 {"tag_name": "tag2", "tag_count": 3}
]

OR object of objects like below?

{
 {"tag_name": "tag1", "tag_count": 2},
 {"tag_name": "tag2", "tag_count": 3}
}

So which is better above json structure interms of INSERT and RETRIEVAL of trending count?

Q4) Can i go with existing {"key" : "value"} format instead of {"key_label" : key, "value_lable" : "value"} and possible to extract trending ..? since i am thinking that {"key" : "value"} is very straight forward and good at performance wise.

Q5) While retrieving i am using SUBSTRING(time_id, 1, 6) AS month. Will it be able to use index?

OR do i need to create multiple columns like time_hour(2018061023), time_day(20180610), time_month(201806) and use query on specific columns?

OR can i use mysql date-time functions? will that use index for faster retrieval?

Please help.

387

asked May 11 '18 17:05

Kanagavelu Sugumar

1 Answers

I don't see a good reason, why you use JSON here. It's also not clear, why you believe that a "nosql schema" within MySQL would do anything better.

What you probably need is something like this:

CREATE TABLE TAG_COUNTER (
    account       varchar(36) NOT NULL,
    time_id       INT NOT NULL,
    tag_name      varchar(50) NOT NULL,
    counter       INT UNSIGNED NOT NULL,
    PRIMARY KEY   (account, time_id, tag_name)
);

This will simplify your queries. The INSERT statement would look like:

INSERT INTO TAG_COUNTER
  (account, time_id, tag_name, counter)
VALUES
  ('google', 2018061023, 'tag1', 1),
  ('google', 2018061023, 'tag2', 1)
ON DUPLICATE KEY UPDATE counter = counter + VALUES(counter);

The SELECT statement might be something like this

SELECT
    SUBSTRING(time_id, 1, 6) AS month,
    tag_name,
    SUM(counter) AS counter_agg
FROM TAG_COUNTER
GROUP BY month, tag_name
ORDER BY month, counter_agg DESC;

Note that I did't try to optimize the table/schema for data size and performance. That would be a different question. But you must see, that the queries are much simpler now.

151

answered Oct 08 '22 11:10

Paul Spiegel

Related questions
                            
                                Nifi "PutSQL" Out of bounds exception
                            
                                Why css and bootstrap is not loading in Laravel 5.3?
                            
                                Including local variable in concat string in MySQL
                            
                                How can I use the PHP radius function with multiple variables?
                            
                                Docker Compose with PHP, MySQL, nginx connection issue
                            
                                What is a Client Connection with User: None , Host: None , Name: compress_gtid_table in MySQL Workbench?
                            
                                How to use EF DbFirst with MySQL in .NET Core?
                            
                                One-to-many select in Jooq
                            
                                Executing H2 under Spring Boot
                            
                                slqlalchemy UniqueConstraint VS Index(unique=True)
                            
                                Flask SQLAlchemy enum field default value
                            
                                MyBatis insert association
                            
                                Slow write of database using `mysqldump `
                            
                                Docker can't connect to mariadb with PHP
                            
                                How to use an operator in where clause with an optional include?
                            
                                Fetching all fields from joined query in Vapor 3
                            
                                Syntax error when importing mysql table with html code in it
                            
                                MariaDB Not Starting: "Plugin 'FEEDBACK' is disabled."
                            
                                How do I set SQL mode in the JDBC connection string for MySQL 8?
                            
                                MySQL Round() function for decimal stored in string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Mysql json based trending tags implementation

Tags:

json

mysql

time-series

analytics

Kanagavelu Sugumar

People also ask

1 Answers

Paul Spiegel

Recent Activity

Donate For Us