Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does SO HTML structure break browser "reader modes"?

Background

Browser "reader mode" reformats web pages to make them more accessible/readble according to individual users' needs (spacing, contrast, font, etc).

While every browser implements reader mode differently, in general, they all nail article-style sites, like Medium, New York Times, Lifehacker, etc.

Problem / Question

But on StackOverflow, reader modes almost all break in various ways, displaying often only the question and/or first answer; the other content is simply missing.

What, specifically, about the HTML structure of SO/SE pages confuses browser reader modes?

Put another way, how could the HTML structure of the page be changed to allow browser reader mode to correctly parse and display all the question/answer content?

Progress so far

From comparing various browsers,* mobile and desktop, it seems that there is some kind of simple heuristic that reader mode uses to determine what content to display, like, "show only the element with the most text, and hide all other content." Or, "show only the element with the most text, but weighted toward elements closer to the top of the page."

*Tried so far (mobile and desktop): Firefox, Safari, and Chrome (note that Chrome only has this natively on mobile, and calls it "simplified view"). Access to reader mode on Firefox and Safari is in the URL bar; access in Chrome is at the bottom of the screen, when supported.

But it's also possible that there is some scanning of element tags/classes/ids looking for a semantic indication of what the important content is.

From poking around in browser DevTools, I've noticed that there appear to be two wrapper <div>s that contain all the Q&A content, while the individual question and answers all also get their own <div>. This baffles me, because often it's the first question and first answer that get displayed, while I would expect one or the other -- or for the reader to detect the wrapper and display all the content.

While all browsers implement this differently, since they can all handle article-style sites with no problem, cutting out non-content elements, I'm looking for a tweak to the structure or semantics of SO/SE pages that would similarly induce browser reader modes to capture all the content.

Accessibility

SE recently widened its line spacing across all SE/SO sites to promote accessibility (e.g. for readers with dyslexia). My question here is the technical sister-question to my post on meta, in which I suggest that SE sites better support reader views.(Note that I am not for or against the formatting change; I'm just interested in investigating a way to support user-determined formatting through reader modes.) I'm hoping that this question will serve as the technical-investigation corollary to that post, and will yield some actionable information about what modifications could be made to support browser reader modes.

like image 994
ultraGentle Avatar asked Sep 02 '20 14:09

ultraGentle


1 Answers

Here is the source code for reader view for FireFox, we can use this to make some best guesses as to why Stack Overflow doesn't quite work. I would imagine similar criteria is used for other reader view implementations.

Criteria used by reader view that conflict with Stack Overflow

A few thing jump out as to why Stack Overflow doesn't work with a reader view.

It uses criteria designed to remove comments from a page.

There are a few criteria such as length of text (300+ characters for max points as likely candidate), use of commas and class names all designed to remove nodes that look like comments.

As reader mode is designed to read blog posts this is obviously favourable, but on Stack Overflow this is likely to remove large sections of the page as candidates as they have lots of low scoring nodes (I mean....who uses commas in comments anyway? They use valuable characters of your character count! hehe).

Reader View doesn't use semantics as an indicator

This one surprised me, but it doesn't look for semantic elements such as <main>, <article> etc. when making decisions.

This wouldn't help Stack Overflow as it stands as they don't use them either but it would have been my immediate thought on how to fix this.

Class names don't help Stack Overflow

However they do look for class names that indicate whether an item is likely to be relevant.

Negative classes (which I believe include partial matches such as "main-comments" for "comment") are as follows (which are likely to result in a node being removed from the likely candidates list):-

unlikelyCandidates: /-ad-|ai2html|banner|breadcrumbs|combx|comment|community|cover-wrap|disqus|extra|footer|gdpr|header|legends|menu|related|remark|replies|rss|shoutbox|sidebar|skyscraper|social|sponsor|supplemental|ad-break|agegate|pagination|pager|popup|yom-remote/i,

Within that list are "comment" and "share", the comments on a section therefore are likely to score poorly (as they have the class "comment" on each comment) and the actual answer sections are also likely to score poorly due to the "share edit follow close flag" section containing a class of "share".

Stack Overflow could change these class names and that would possibly improve the chances of a full page being rendered in the reader view, but it is a hack and probably not very robust!

Comments and short answers are the killer

As I mentioned length of text in an element and the use of commas are criteria for determining whether an element is a candidate for "the main text" on a page.

Comments, short answers etc. on Stack Overflow will always conflict with this scoring mechanism and so that poses a major problem for Stack Overflow to be able to do anything about reader view compatibility.

With that in mind......

What is the solution?

As far as making a site compatible with reader view, this is not something you should pursue with any vigour, it will lead to poor decisions.

Trying to adjust Stack Overflow to meet the criteria of reader views would result in hacks at best and introducing accessibility issues at worst!

The problem here is that Q&A sites do not behave well with reader views. Try Quora on Reader View, it also doesn't work.

To illustrate my way of thinking on this, the original question could be changed to "What could FireFox do to its reader view to make it compatible with Q & A sites" and that would then open up discussions about the criteria they use to work out what content to show, as the problem lies with their implementation more than Stack Overflow / Quora (not that Stack Overflow / Quora etc. are perfect by any means!).

Instead I can think of a couple of solutions that would allow the accessibility features I believe you are using reader view for:-

Fix it yourself

As this all seems to have stemmed from the line height changes you could create a plugin or bookmarklet that fixes all the styles on a page.

As the readability source code is available you could easily adjust the source code to account for Stack Overflow specific design (give an extra weighting of +200 points to <div id="mainbar" as that is the container we want to display).

Then just adjust the bookmarklet to point to your own server with the modified readability script and voila, working solution.

If you decide to do this share it with the community, it could get you some nice reputation and would look great on a CV / your social media profiles etc.

Get Stack Overflow to fix it!

A better option would be to ask Stack Overflow to implement accessibility settings. Stack Overflow should try to be a leader in the field and I am sure a request for accessibility features would be well received and eventually added to the development roadmap.

Ask for an accessibility settings screen or drop down. A good starting point would be something like the User Interface Options component.. Click "+ show preferences" and you will see you can adjust a load of things on the site.

This way you could fix the line height (which appears to be the original starting point of this as you use that to adjust the design to your liking). They could also then implement a simplified view similar to reader view ("focused mode").

As an additional benefit of this approach the editor could still be usable and you could view the output in a format that is easy for you to read.

It seems like the best solution and in terms of complexity on a site like SO sits on the low end of technical limitations / design considerations.

Final irony of the reader view criteria system

At time of writing this if you view this page in reader view mode in FireFox you will see the question title and then see my answer as the main body text. Your whole question body text gets removed from the page. I think that happens to sum up the problem perfectly!

screenshot of the page with the original title but my content visible as the main body text in reader mode within FireFox

like image 137
Graham Ritchie Avatar answered Nov 16 '22 23:11

Graham Ritchie