Goal
Our users work in Google Docs. The text they write will be read to them as they type using text-to-speech. It should work across as many platforms and browsers as possible.
Our solution
This seems to fit the Google Apps Script, it works on all desktop browsers and some mobile browsers.
This works
We have a text-to-speech module which works great, so that is no problem. We are using a sidebar currently. The sidebar can play audio using the HTML 5 Audio tag which works without any problems.
The Problem
The problem is actually getting the text from the Google docs document. I have so far not been able to find any way to access the Google document text directly from the sidebar. What we have been doing instead is:
It takes a second or more from the time the user has inputted text in google docs to the time when the change is synchronized up into google docs cloud.
We have timed the different steps. The text-to-speech is fast, and the HTML5 audio is no problem either.
The time sink is getting the text changes. It currently takes 1-3 seconds, which is way too long for our use case.
Question
Can we access the text in the Google Docs faster? Maybe directly instead of going through Google's cloud?
UPDATE 2017-02-15 It appears it currently isn't possible. What is possible is to do this with a Chrome Extension, it parses the Google Docs homepage and extracts the text from the HTML+JS. This is rather difficult but... possible.
To see live edits, open the Accessibility settings by going to Tools > Accessibility settings and check “Turn on screen reader support.” Then, select “Show live edits” from the Accessibility menu.
Its latest offering is for Live Edits in Google Docs, designed to help users keep tabs on real-time updates made by document collaborators.
To make tracked edits in Google Docs, pop open the 'Editing' menu at the top right hand corner of your document. Your Google Doc now functions exactly as a Word Doc when you turn on 'Track Changes' You can see who made the change, when they made it and what the change was, just as you can in Word.
If a browser plugin is an appropriate way to deliver the feature, it should be possible to listen to changes that Google Docs makes to the DOM when it updates the page content.
// This div contains all of the page content and not much else, in my rudimentary testing. var pageRoot = document.getElementsByClassName('kix-appview-editor')[0].firstChild; var observer = new MutationObserver(handleNewChanges); observer.observe(pageRoot, { subtree: true, childList: true, attributes: false, }); // Later, you can stop observing observer.disconnect();
Your handleNewChanges
function will be called any time the content of the DOM changes, with a list of changes. The changes are pretty messy, but
someNode.innerText
to get the actual content.By observing the changes and keeping some document state, you should be able to determine when the sorts of changes that you care about happen.
This seems like a good fit for your use case, because
No remote servers are needed. The data flow would look more like this, entirely within the browser tab:
--------------- ---------- | Google Docs | <= fetch doc <= | Your | | Document | => DOM changes => | Module | --------------- ----------
The updates are synchronised with the document visually updating, which feels like the natural thing to trigger this.
The amount of bookkeeping that you need to do to parse each DOM change can probably be constant (that is, without looping over the document content). This would mean that the overhead that the observing adds is constant, so it should scale to any sized document.
As you've figured out, a browser extension is a good solution, and it might be easier than you think: Chrome's extension APIs are well documented and building an extension is very similar to building a web page with HTML and Javascript.
There's even an extension API for TTS that can integrate with custom TTS engines:
Use the chrome.ttsEngine API to implement a text-to-speech(TTS) engine using an extension. If your extension registers using this API, it will receive events containing an utterance to be spoken and other parameters when any extension or Chrome App uses the tts API to generate speech. Your extension can then use any available web technology to synthesize and output the speech, and send events back to the calling function to report the status.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With