Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Returning HTML body from Nightmare.js

I'm currently working on some scraping with cheerio and nightmare. The reason why I'm using both and not just cheerio is because I have to manipulate the site to get to the part that I want to scrape and I found nightmare very good at doing those scripts.

So, right now I'm using nightmare to get until the part that the info that I need is displayed. After that, on the evaluate() I'm trying to somehow return the current html to then pass it to cheerio to do the scrape. The problem is that I don't know how to retrieve the html from the document object. Is there is a property from the document thats returns the full body?

Here is what I'm trying to do:

var Nightmare = require('nightmare');
var nightmare = Nightmare({show:true})
var express = require('express');
var fs = require('fs');
var request = require('request');
var cheerio = require('cheerio');
var app     = express();

var urlWeb = "url";
var selectCity = "#ddl_city"

nightmare
.goto(urlWeb)
.wait(selectCity)
.select('#ddl_city', '19')
.wait(6000)
.select('#ddl_theater', '12')
.wait(1000)
.click('#btn_enter')
.wait('#aspnetForm')
.evaluate(function(){

    //here is where I want to return the html body
    return document.html;


})
.then(function(body){
//loading html body to cheerio
    var $ = cheerio.load(body);
    console.log(body);
})
like image 834
Jose Bernhardt Avatar asked Sep 25 '16 20:09

Jose Bernhardt


1 Answers

With this worked:

document.body.innerHTML
like image 176
Jose Bernhardt Avatar answered Oct 18 '22 23:10

Jose Bernhardt