Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Google Apps Script have something like getElementById?

I am gonna to use Google App Script to fetch the programme list from the website of radio station. How can I select the specified elements in the webpage by specifying the id of the element? Therefore, I can get the programs in the webpage.

like image 813
benleung Avatar asked May 22 '13 14:05

benleung


3 Answers

Edit, Dec 2013: Google has deprecated the old Xml service, replacing it with XmlService. The script in this answer has been updated to use the new service. The new service requires standard-compliant XML & HTML, while the old one was forgiving of such problems as missing close-tags.


Have a look at the Tutorial: Parsing an XML Document. (As of Dec 2013, this tutorial is still on line, although the Xml service is deprecated.) Starting with that foundation, you can take advantage of the XML parsing in Script Services to navigate the page. Here's a small script operating on your example:

function getProgrammeList() {
  txt = '<html> <body> <div> <div> <div id="here">hello world!!</div> </div> </div> </html>'

  // Put the receieved xml response into XMLdocument format
  var doc = Xml.parse(txt,true);

  Logger.log(doc.html.body.div.div.div.id +" = "
            +doc.html.body.div.div.div.Text );    /// here = hello world!!

  debugger;  // Pause in debugger - examine content of doc
}

To get the real page, start with this:

var url = 'http://blah.blah/whatever?querystring=foobar';
var txt = UrlFetchApp.fetch(url).getContentText();
....

If you look at the documentation for getElements you'll see that there is support for retrieving specific tags, for example "div". That finds direct children of a specific element, it doesn't explore the entire XML document. You should be able to write a function that traverses the document examining the id of each div element until it finds your programme list.

var programmeList = findDivById(doc,"here");

Edit - I couldn't help myself...

Here's a utility function that will do just that.

/**
 * Find a <div> tag with the given id.
 * <pre>
 * Example: getDivById( html, 'tagVal' ) will find
 * 
 *          <div id="tagVal">
 * </pre>
 *
 * @param {Element|Document}
 *                     element     XML document or element to start search at.
 * @param {String}     id      HTML <div> id to find.
 *
 * @return {XmlElement}        First matching element (in doc order) or null.
 */
function getDivById( element, id ) {
  // Call utility function to do the work.
  return getElementByVal( element, 'div', 'id', id );
}

/**
 * !Now updated for XmlService!
 *
 * Traverse the given Xml Document or Element looking for a match.
 * Note: 'class' is stripped during parsing and cannot be used for
 * searching, I don't know why.
 * <pre>
 * Example: getElementByVal( body, 'input', 'value', 'Go' ); will find
 * 
 *          <input type="submit" name="btn" value="Go" id="btn" class="submit buttonGradient" />
 * </pre>
 *
 * @param {Element|Document}
 *                     element     XML document or element to start search at.
 * @param {String}     elementType XML element type, e.g. 'div' for <div>
 * @param {String}     attr        Attribute or Property to compare.
 * @param {String}     val         Search value to locate
 *
 * @return {Element}               First matching element (in doc order) or null.
 */
function getElementByVal( element, elementType, attr, val ) {
  // Get all descendants, in document order
  var descendants = element.getDescendants();
  for (var i =0; i < descendants.length; i++) {
    var elem = descendants[i];
    var type = elem.getType();
    // We'll only examine ELEMENTs
    if (type == XmlService.ContentTypes.ELEMENT) {
      var element = elem.asElement();
      var htmlTag = element.getName();
      if (htmlTag === elementType) {
        if (val === element.getAttribute(attr).getValue()) {
          return element;
        }
      }
    }
  }
  // No matches in document
  return null;
}

Applying this to your example, we get this:

function getProgrammeList() {
  txt = '<html> <body> <div> <div> <div id="here">hello world!!</div> </div> </div> </html>'

  // Get the receieved xml response into an XML document
  var doc = XmlService.parse(txt);

  var found = getDivById(doc.getElement(),'here');
  Logger.log(found.getAttribute(attr).getValue()  
             + " = "
             + found.getValue());    /// here = hello world!!
}

Note: See this answer for a practical example of the use of these utilities.

like image 99
Mogsdad Avatar answered Oct 10 '22 03:10

Mogsdad


Someone has made an example here where the following custom functions are available for cut & paste use:

  • getElementById()
  • getElementsByClassName()
  • getElementsByTagName()

Then you can do something like this

function doGet() {
  var html = UrlFetchApp.fetch('http://en.wikipedia.org/wiki/Document_Object_Model').getContentText();
  var doc = XmlService.parse(html);
  var html = doc.getRootElement();
  var menu = getElementsByClassName(html, 'menu-classname')[0];
  return menu;
}
like image 29
AlexG Avatar answered Oct 10 '22 03:10

AlexG


I'm going to assume that you are referring to using UrlFetchApp's fetch() method. In which case, the answer is no, in the context of what you are thinking of.

If you look at the return type for fetch() in the documentation it returns HTTPResponse. There are a few methods for that, but most of them involve getting the returned data as a string. The good news is, you could still use any (well, most) of the traditional JS String methods documented here - so you could use search(), match(), etc. Depending on your project you could use those to find the data you are looking for in the response.

like image 29
Greg Avatar answered Oct 10 '22 03:10

Greg