I am banging my head against something and I was wondering if somebody more skilled that me could help me out.
My aim is to create a comment thread that factors in a system of comment scoring.
First I'll explain where I am currently.
Say we have a comment thread on an article that looks like the example below. The number in parenthasis is the ID of that comment. ID's are assigned automatically by the database and increment chronologically with each additional comment posted. The number of dashes before the comment text reperesent the comment depth.
(01)"This is a top level comment."
(02)-"This is a second level comment. A reply to the top level comment above."
(06)-"This is also a second level comment / another reply to comment 01."
(07)--"This is a reply to comment 06."
(03)"This is a different top level comment."
(05)-"This is a reply to the comment above."
(08)--"This is a reply to that comment in turn."
(10)---"This is a deeper comment still."
(04)"This is one more top level comment."
(09)-"This is one more reply."
My first problem was storing this data in a way that means it can be returned in the correct order. If you simply store a depth field and order by depth, it'll bring back all of the top level comments first and then the second level comments etc. This isn't right, we must return the comments with the full parentage still intact.
One way to achieve this is to store the full parentage for each comment.
Comment ID | Parentage
01 | (Comment 01 has no parent because it is top level)
02 | 01- (Comment 02 was a reply to comment 01)
03 |
04 |
05 | 03-
06 | 01-
07 | 01-06- (Comment 07 has two ancestors 01 and then 06)
08 | 03-05-
09 | 04-
10 | 03-05-08-
Adding another comment record is as simple as grabbing the Parentage from the comment that you are replying to, and appending its ID to form the new parentage. For example, if I was replying to comment 10, I would take it's parentage (03-05-08-) and append its ID (10-). The database would automatically recognise it as the 11th comment, and we'd get:
Comment ID | Parentage
01 |
02 | 01-
03 |
04 |
05 | 03-
06 | 01-
07 | 01-06-
08 | 03-05-
09 | 04-
10 | 03-05-08-
11 | 03-05-08-10-
Now, when we order the comments for display, we order on a concatenation of Parentage and Comment ID which gives us:
Order by CONCAT(Parentage, ID)
Comment ID | Parentage | CONCAT(Parentage, ID)
01 | | 01-
02 | 01- | 01-02-
06 | 01- | 01-06-
07 | 01-06- | 01-06-07-
03 | | 03-
05 | 03- | 03-05-
08 | 03-05- | 03-05-08-
10 | 03-05-08- | 03-05-08-10-
11 | 03-05-08-10- | 03-05-08-10-11-
04 | | 04-
09 | 04- | 04-09-
Which produces the exact same list as first demonstrated. With Comment 11 which we later added inserted in the correct place:
(01)"This is a top level comment."
(02)-"This is a reply to the top level comment."
(06)-"This is another reply that was posted later than the first."
(07)--"This is a reply to the second level comment directly above."
(03)"This is a different top level comment."
(05)-"This is a reply to the comment above."
(08)--"This is a reply to the comment above."
(10)---"This is a deeper comment still."
(11)----"THIS COMMENT WAS ADDED IN THE EARLIER EXAMPLE."
(04)"This is one more top level comment."
(09)-"This is one more reply."
Indenting can be done by checking the length of the CONCAT string and multiplying the len(CONCAT(Parentage, ID)) by a set number of pixels. This is great, we have a system of storing comments in a way that recognises their parentage.
Now the problem:
Not all comments are equal. A system of comment scoring is needed to distinguish good comments. Let's say each comment has an upvote button.. while we want to retain parentage, if one comment has two direct replies at the same level then we want to show the one with the most upvotes first. I'll add some votes in [square brackets] below.
(01)"This is a top level comment." [6 votes]
(02)-"This is a reply to the top level comment." [2 votes]
(06)-"This is another reply that was posted later than the first." [30 votes]
(07)--"This is a reply to the second level comment directly above." [5 votes]
(03)"This is a different top level comment." [50 votes]
(05)-"This is a reply to the comment above." [4 votes]
(08)--"This is a reply to the comment above." [0 votes]
(10)---"This is a deeper comment still." [0 votes]
(11)----"THIS COMMENT WAS ADDED IN THE EARLIER EXAMPLE." [0 votes]
(04)"This is one more top level comment." [2 votes]
(09)-"This is one more reply." [0 votes]
In this example, comments (01) and (03) are both top-level but (03) has [50 votes] and (01) only has [6 votes]. (01) appears above only by virtue of the fact that it was posted earlier and therefore has been assigned a smaller ID. Likewise (02) and (06) are both replies to (01) but must be reordered to allow the one with the most votes (06) to rise to the top.
I am completely and utterly stuck in trying to achieve this.
I imagine that any ordering/reordering and indexing would be better done on a comment-vote being cast rather than on page load so that the page-load time can be as quick as possible but beyond that I have absolutely no idea!
Any ideas or light you could shed on possible avenues would really take a load off! Thanks for your help as always.
Edit: In response to @Paddy's solution,
When I run the expression offered by @Paddy below on the mock data, the first error I get is:
"The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified."
This can be remedied by adding SELECT 'top 100 percent' to the recursive member definition. Once this is done, I get the error:
'CommentTree' has more columns than were specified in the column list.
This can be resolved by adding a 'Level' column to the CommentTree specification. This then prints the data, but it returns all the top level comments first and then something resembling (but not actually matching) the correct sort order after.
The data is returned as such:
ParentId | CommentId | Comment | Vote | Level
NULL | 1 | Text here | 6 | 0
NULL | 3 | Text here | 50 | 0
NULL | 4 | Text here | 2 | 0
4 | 9 | Text here | 0 | 1
3 | 5 | Text here | 4 | 1
5 | 8 | Text here | 0 | 2
8 | 10 | Text here | 0 | 3
10 | 11 | Text here | 0 | 4
1 | 2 | Text here | 2 | 1
1 | 6 | Text here | 30 | 1
6 | 7 | Text here | 5 | 2
Have I done anything wrong or did @Paddy miss out something perhaps? Please accept my apologies, recursive functions are very new to me.
Before diving into designing and writing code, let’s break down what actually makes a good comment thread. A good comment thread will have the following characteristics: Comments are readable, and all the important data-points are visible to the user at a glance. This includes the author, number of votes, timestamp, and the content.
If a comment thread does not have any comments (empty state), we need to communicate that clearly to the user. A simple paragraph containing the line “There are no comments yet.” along with a form containing a text box and submit button can go a very long way, and should be the bare minimum when dealing with empty states.
If you give users a form to provide their inputs, you can guarantee that you will find spam coming your way, so this is obviously an issue to address when building comment threads. A great way to reduce spam is to use services like reCAPTCHA from Google.
Code below looks good for your task. It's a bit complex, but it was a challenge for me to make it in a single SELECT
. You can split it into several SELECT
with prefetch into temporary tables (for performance purposes), or keep it together.
Thank you for the question, it was interesting!
Please note that ParentID
for root nodes must be 0
, not NULL
.
DECLARE @a TABLE (
CommentID INT,
ParentID INT,
Comment VARCHAR(100),
Vote INT
)
INSERT @a
VALUES
(1, 0, '', 6),
(3, 0, '', 50),
(4, 0, '', 2),
(9, 4, '', 0),
(5, 3, '', 4),
(8, 5, '', 0),
(10, 8, '', 0),
(11, 10, '', 0),
(2, 1, '', 2),
(6, 1, '', 30),
(7, 6, '', 5)
;WITH CTE_1 (ParentId, CommentId, Comment, Vote, Level, LevelPriority, Path) -- prepare base info
AS
(
SELECT c.ParentId, c.CommentId, c.Comment, c.Vote, 0 AS Level, ROW_NUMBER() OVER(ORDER BY c.Vote DESC), CAST('/' + CAST(c.CommentId AS VARCHAR(32)) AS VARCHAR(MAX)) + '/'
FROM @a AS c
WHERE ParentId = 0
UNION ALL
SELECT c.ParentId, c.CommentId, c.Comment, c.Vote, Level + 1 AS Level, ROW_NUMBER() OVER(ORDER BY c.Vote DESC), d.Path + CAST(c.CommentId AS VARCHAR(32)) + '/'
FROM @a AS c
INNER JOIN CTE_1 AS d
ON c.ParentID = d.CommentID
),
CTE_2 (ParentId, CommentId, Comment, Vote, Level, LevelPriority, ChildCount) -- count number of children
AS
(
SELECT p.ParentId, p.CommentId, p.Comment, p.Vote, p.Level, p.LevelPriority, COUNT(*)
FROM CTE_1 AS p
INNER JOIN CTE_1 AS c
ON c.Path LIKE p.Path + '%'
GROUP BY
p.ParentId, p.CommentId, p.Comment, p.Vote, p.Level, p.LevelPriority
),
CTE_3 (ParentId, CommentId, Comment, Vote, Level, LevelPriority, OverAllPriority, ChildCount) -- calculate overall priorities
AS
(
SELECT c.ParentId, c.CommentId, c.Comment, c.Vote, c.Level, c.LevelPriority, 1 AS OverAllPriority, ChildCount
FROM CTE_2 AS c
WHERE Level = 0 AND LevelPriority = 1
UNION ALL
SELECT c.ParentId, c.CommentId, c.Comment, c.Vote, c.Level, c.LevelPriority,
CASE
WHEN c.ParentID = d.CommentID THEN d.OverAllPriority + 1
ELSE d.OverAllPriority + d.ChildCount
END,
c.ChildCount
FROM CTE_2 AS c
INNER JOIN CTE_3 AS d
ON
(c.ParentID = d.CommentID AND c.LevelPriority = 1)
OR (c.ParentID = d.ParentID AND d.LevelPriority + 1 = c.LevelPriority)
)
SELECT ParentId, CommentId, Comment, Vote
FROM CTE_3
ORDER BY OverAllPriority
In this query I do following:
SELECT
.position = 1
position = parent_position + 1
position = prev_sibling_position + prev_sibling_number_of_descendants
EDIT The same solution, but without CTE.
DECLARE @a TABLE (
CommentID INT,
ParentID INT,
Comment VARCHAR(100),
Vote INT
)
INSERT @a
VALUES
(1, 0, '', 6),
(3, 0, '', 50),
(4, 0, '', 2),
(9, 4, '', 0),
(5, 3, '', 4),
(8, 5, '', 0),
(10, 8, '', 0),
(11, 10, '', 0),
(2, 1, '', 2),
(6, 1, '', 30),
(7, 6, '', 5)
DECLARE @rows INT
DECLARE @temp_table TABLE (
CommentID INT,
ParentID INT,
Comment VARCHAR(100),
Vote INT,
LevelPriority INT,
Path VARCHAR(MAX),
ChildCount INT NULL,
OverAllPriority INT NULL
)
INSERT @temp_table (CommentID, ParentID, Comment, Vote, LevelPriority, Path)
SELECT CommentID, ParentID, Comment, Vote, ROW_NUMBER() OVER(ORDER BY Vote DESC), '/' + CAST(CommentId AS VARCHAR(32)) + '/'
FROM @a
WHERE ParentID = 0
SELECT @rows = @@ROWCOUNT
WHILE @rows > 0
BEGIN
INSERT @temp_table (CommentID, ParentID, Comment, Vote, LevelPriority, Path)
SELECT a.CommentID, a.ParentID, a.Comment, a.Vote, ROW_NUMBER() OVER(PARTITION BY a.ParentID ORDER BY a.Vote DESC), c.Path + CAST(a.CommentId AS VARCHAR(32)) + '/'
FROM @a AS a
INNER JOIN @temp_table AS c
ON a.ParentID = c.CommentID
WHERE NOT
a.CommentID IN (SELECT CommentID FROM @temp_table)
SELECT @rows = @@ROWCOUNT
END
UPDATE c
SET ChildCount = a.cnt
FROM (
SELECT p.CommentID, COUNT(*) AS cnt
FROM @temp_table AS p
INNER JOIN @temp_table AS c
ON c.Path LIKE p.Path + '%'
GROUP BY
p.CommentID
) AS a
INNER JOIN @temp_table AS c
ON a.CommentID = c.CommentID
UPDATE @temp_table
SET OverAllPriority = 1
WHERE ParentID = 0 AND LevelPriority = 1
SELECT @rows = @@ROWCOUNT
WHILE @rows > 0
BEGIN
UPDATE c
SET
OverAllPriority = CASE
WHEN c.ParentID = p.CommentID THEN p.OverAllPriority + 1
ELSE p.OverAllPriority + p.ChildCount
END
FROM @temp_table AS p
INNER JOIN @temp_table AS c
ON (c.ParentID = p.CommentID AND c.LevelPriority = 1)
OR (p.ParentID = c.ParentID AND p.LevelPriority + 1 = c.LevelPriority)
WHERE
c.OverAllPriority IS NULL
AND p.OverAllPriority IS NOT NULL
SELECT @rows = @@ROWCOUNT
END
SELECT * FROM @temp_table
ORDER BY OverAllPriority
Although not directly related to your question, my advice would be to change to the Nested Set Model. I know it is a lot of rework but sooner or later you'll realise it is the best choice :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With