Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

per-paragraph commenting system

I'm very interested in the emerging trend of comments-per-paragraph systems (also called "annotations systems"), such as the ones implemented by medium.com and qz.com and i'm looking at the idea of developing one for my own.

Question: it seems they are mainly implemented via javascript, that runs through the text's html paragraphs uniquely identified by an id attribute (or, in the case of Medium, a name attribute). Does it mean their CMS actually store each paragraph as a separate entry in the database? Seems overly complex to me, but otherwise, how do they manage the fact that a paragraph can be deleted, edited or moved around in the overall text? How would the unique id be preserved if the author changes the paragraph? How is that unique id logically structured? (post_id + position_in_post)?

Thank you for your insights...

like image 986
pixeline Avatar asked Oct 20 '13 12:10

pixeline


People also ask

What is a comment system?

Comment systems, also known as commenting software, allow users to comment on a website, typically below a news article or blog post. Comment systems give website visitors the ability to engage with a website by commenting their views or reaction to the content on the page.

How do you comment multiple lines in HTML?

Multiline Comments So far we have seen single line comments, but HTML supports multi-line comments as well. You can comment multiple lines by the special beginning tag <! -- and ending tag --> placed before the first line and end of the last line as shown in the given example below.


2 Answers

I can't speak to the medium side, but as one of the developers for Quartz, I can give insight into how qz.com annotations work.

The annotations code is custom php code and is independent of the CMS for publishing articles (wordpress VIP). We do indeed store a reference to each paragraph as a row in the database, in order to track any updates to the article content. We call this an annotation thread and when a user saves an annotation the threadId gets stored along with the annotation.

We do not have a unique id stored on the wordpress side for each paragraph, instead we store the paragraphs relative position in that article (nodeIndex “3" and nodeSelector “p” == the third p-tag in the content body for a given article) and the javascript determines where exactly to place the annotation block. We went this route to avoid heavier customizations on the wordpress side, though depending on your CMS it may be easier to address this directly in the CMS code and add unique ids in the html before sending to the client.

Every time an update to an article is published, each paragraph in the updated article is compared against what was previously stored with the annotation threads for that article. If the position and paragraph text do not match up, it attempts to find the paragraph that is the closest match and update the row for that thread and new threads are created and deleted where appropriate. All of this is handled server side whenever changes are published to an article.

A couple of alternate implications that are also worth looking at are Gawker's Kinja text annotations (currently in use on Jalopnik) and the word-for-word annotations of rapgenius.com.

like image 187
Sam Williams Avatar answered Oct 02 '22 02:10

Sam Williams


(disclaimer: I'm a factlink dev.)

I work for a company trying to allow per-paragraph (or per-phrase) commenting on arbitrary sites. Essentially, you've got two choices to identify the anchor of a comment.

  1. Remember the structure of the page (e.g. some path from a root to a paragraph), and place comments at the same position next time.
  2. Identify the content of the paragraph and place comments near identical or similar content next time.

Both systems have their downsides, but you pretty much need to go with option 2 if you want a robust system. Structural identification is fragile in the face of changing structure. Especially irrelevant changes such theming or the precise html tags used can significantly impact the "path". When that happens, you really can't fix it - unless you inspect the content, i.e. option (2).

Sam describes what comes down to a server-side content-based in his answer. Purely client-side content-based matching is what factlink and (IIRC) hypothesis use. Most browsers support non-standard but fast substring search in page content using either window.find or TextRange.findText. Alternatively, you could walk the DOM, which is slower but gives you the flexibility to implement (e.g.) fuzzy matching.

It may seem like client-side matching is overkill or complex, but really, it's simpler: it's a very robust way to decouple your content-management from your commenting. Neither is really simple, so decoupling those concerns can be a win.

like image 35
Eamon Nerbonne Avatar answered Oct 02 '22 03:10

Eamon Nerbonne