Goal Our users work in Google Docs. The text they write will be read to them as they type using text-to-speech. It should work across as many platforms and browsers as possible. Our solution This seems to fit the Google Apps Script, it works on all desktop browsers and some mobile browsers. This works We have a text-to-speech module which works great, so that is no problem. We are using a sidebar currently. The sidebar can play audio using the HTML 5 Audio tag which works without any problems. The Problem The problem is actually getting the text from the Google docs document. I have so far not been able to find any way to access the Google document text directly from the sidebar. What we have been doing instead is: <ol> <li>Sidebar polls every x millisecond our Google Apps Script running on Google's cloud</li> <li>Our Google Apps Script running on Google's cloud then accesses the synchronized document in the cloud</li> <li>If it finds any changes it sends them back to the Sidebar</li> <li>Sidebar plays the audio using the HTML5 Audio tag and our Text-To-Speech. </li> </ol> <img src="https://i.stack.imgur.com/KkAr4.png" alt="enter image description here"> It takes a second or more from the time the user has inputted text in google docs to the time when the change is synchronized up into google docs cloud. We have timed the different steps. The text-to-speech is fast, and the HTML5 audio is no problem either. The time sink is getting the text changes. It currently takes 1-3 seconds, which is way too long for our use case. Question Can we access the text in the Google Docs faster? Maybe directly instead of going through Google's cloud? UPDATE 2017-02-15 It appears it currently isn't possible. What is possible is to do this with a Chrome Extension, it parses the Google Docs homepage and extracts the text from the HTML+JS. This is rather difficult but... possible.

If a browser plugin is an appropriate way to deliver the feature, it should be possible to listen to changes that Google Docs makes to the DOM when it updates the page content. <pre class="prettyprint"><code>// This div contains all of the page content and not much else, in my rudimentary testing. var pageRoot = document.getElementsByClassName('kix-appview-editor')[0].firstChild; var observer = new MutationObserver(handleNewChanges); observer.observe(pageRoot, { subtree: true, childList: true, attributes: false, }); // Later, you can stop observing observer.disconnect(); </code></pre> Your <code>handleNewChanges</code> function will be called any time the content of the DOM changes, with a list of changes. The changes are pretty messy, but <ul> <li>inconsequential changes (like the user selecting some text) can be filtered by looking at the added and removed nodes,</li> <li>you can walk up the DOM tree to find the location of the changes in the document, and</li> <li>you can use <code>someNode.innerText</code> to get the actual content.</li> </ul> By observing the changes and keeping some document state, you should be able to determine when the sorts of changes that you care about happen. <hr> This seems like a good fit for your use case, because <ul> <li> No remote servers are needed. The data flow would look more like this, entirely within the browser tab: <pre class="prettyprint"><code>--------------- ---------- | Google Docs | <= fetch doc <= | Your | | Document | => DOM changes => | Module | --------------- ---------- </code></pre> </li> <li>The updates are synchronised with the document visually updating, which feels like the natural thing to trigger this.</li> <li>The amount of bookkeeping that you need to do to parse each DOM change can probably be constant (that is, without looping over the document content). This would mean that the overhead that the observing adds is constant, so it should scale to any sized document.</li> </ul>

Google docs - Access text changes realtime

Tags:

javascript

google-docs

google-apps-script

google-drive-realtime-api

Goal

Our users work in Google Docs. The text they write will be read to them as they type using text-to-speech. It should work across as many platforms and browsers as possible.

Our solution

This seems to fit the Google Apps Script, it works on all desktop browsers and some mobile browsers.

This works

We have a text-to-speech module which works great, so that is no problem. We are using a sidebar currently. The sidebar can play audio using the HTML 5 Audio tag which works without any problems.

The Problem

The problem is actually getting the text from the Google docs document. I have so far not been able to find any way to access the Google document text directly from the sidebar. What we have been doing instead is:

Sidebar polls every x millisecond our Google Apps Script running on Google's cloud
Our Google Apps Script running on Google's cloud then accesses the synchronized document in the cloud
If it finds any changes it sends them back to the Sidebar
Sidebar plays the audio using the HTML5 Audio tag and our Text-To-Speech.

enter image description here

It takes a second or more from the time the user has inputted text in google docs to the time when the change is synchronized up into google docs cloud.

We have timed the different steps. The text-to-speech is fast, and the HTML5 audio is no problem either.

The time sink is getting the text changes. It currently takes 1-3 seconds, which is way too long for our use case.

Question

Can we access the text in the Google Docs faster? Maybe directly instead of going through Google's cloud?

UPDATE 2017-02-15 It appears it currently isn't possible. What is possible is to do this with a Chrome Extension, it parses the Google Docs homepage and extracts the text from the HTML+JS. This is rather difficult but... possible.

598

asked Jan 16 '17 13:01

Mr. Java Wolf

2 Answers

If a browser plugin is an appropriate way to deliver the feature, it should be possible to listen to changes that Google Docs makes to the DOM when it updates the page content.

// This div contains all of the page content and not much else, in my rudimentary testing. var pageRoot = document.getElementsByClassName('kix-appview-editor')[0].firstChild;  var observer = new MutationObserver(handleNewChanges); observer.observe(pageRoot, {   subtree: true,   childList: true,   attributes: false, });  // Later, you can stop observing observer.disconnect();

Your handleNewChanges function will be called any time the content of the DOM changes, with a list of changes. The changes are pretty messy, but

inconsequential changes (like the user selecting some text) can be filtered by looking at the added and removed nodes,
you can walk up the DOM tree to find the location of the changes in the document, and
you can use someNode.innerText to get the actual content.

By observing the changes and keeping some document state, you should be able to determine when the sorts of changes that you care about happen.

This seems like a good fit for your use case, because

No remote servers are needed. The data flow would look more like this, entirely within the browser tab:

---------------                   ----------         | Google Docs | <=  fetch doc  <= |  Your  | |  Document   | => DOM changes => | Module | ---------------                   ----------

The updates are synchronised with the document visually updating, which feels like the natural thing to trigger this.
The amount of bookkeeping that you need to do to parse each DOM change can probably be constant (that is, without looping over the document content). This would mean that the overhead that the observing adds is constant, so it should scale to any sized document.

197

answered Oct 11 '22 11:10

BudgieInWA

As you've figured out, a browser extension is a good solution, and it might be easier than you think: Chrome's extension APIs are well documented and building an extension is very similar to building a web page with HTML and Javascript.

There's even an extension API for TTS that can integrate with custom TTS engines:

Use the chrome.ttsEngine API to implement a text-to-speech(TTS) engine using an extension. If your extension registers using this API, it will receive events containing an utterance to be spoken and other parameters when any extension or Chrome App uses the tts API to generate speech. Your extension can then use any available web technology to synthesize and output the speech, and send events back to the calling function to report the status.

answered Oct 11 '22 13:10

aglensmith

Related questions
                            
                                How can I use lodash/underscore to sort by multiple nested fields?
                            
                                How to count every checked checkboxes
                            
                                how to add paragraph on top of div content
                            
                                Node.js console.log() in txt file
                            
                                Get the page file name from the address bar
                            
                                Payload error in jsonwebtoken
                            
                                Add stylesheet to Head using javascript in body
                            
                                how to call parent constructor?
                            
                                Appending large block of html with append()
                            
                                jQuery data attr not setting
                            
                                Node.js Express app handle startup errors
                            
                                Bootstrap close modal not working
                            
                                merge two arrays of keys and values to an object using underscore
                            
                                How to make a owl carousel with arrows instead of next previous
                            
                                Missing headers in Fetch response
                            
                                Mongoose populate sub-sub document
                            
                                Node.js crashes when using long interval in setinterval
                            
                                Debugging SystemJS module loading?
                            
                                What are the major differences (pros/cons) between ember-model, ember-restless, and emu?
                            
                                Canvas signature touch creates issue in phonegap

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With