Getting the page title from a scraped webpage [closed]

Question

var http = require('http');
var urlOpts = {host: 'www.nodejs.org', path: '/', port: '80'};
http.get(urlOpts, function (response) {
response.on('data', function (chunk) {
var str=chunk.toString();
var re = new RegExp("(<\s*title[^>]*>(.+?)<\s*/\s*title)\>", "g")
console.log(str.match(re));
});

});

Output

user@dev ~ $ node app.js [ 'node.js' ] null null

I only need to get the title.

bdukes · Accepted Answer

I would suggest using RegEx.exec instead of String.match. You can also define the regular expression using the literal syntax, and only once:

var http = require('http');
var urlOpts = {host: 'www.nodejs.org', path: '/', port: '80'};
var re = /(<\s*title[^>]*>(.+?)<\s*\/\s*title)>/gi;
http.get(urlOpts, function (response) {
    response.on('data', function (chunk) {
        var str=chunk.toString();
        var match = re.exec(str);
        if (match && match[2]) {
          console.log(match[2]);
        }
    });    
});

The code also assumes that the title will be completely in one chunk, and not split between two chunks. It would probably be best to keep an aggregation of chunks, in case the title is split between chunks. You may also want to stop looking for the title once you've found it.

gradosevic · Answer

Try this:

var re = new RegExp("<title>(.*?)</title>", "i");
console.log(str.match(re)[1]);

Getting the page title from a scraped webpage [closed]

Tags:

node.js

user1777212

2 Answers

bdukes

gradosevic

Recent Activity

Donate For Us

Getting the page title from a scraped webpage [closed]

Tags:

node.js

user1777212

2 Answers

bdukes

gradosevic

Related questions

Recent Activity

Donate For Us