Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing script tag content with nodejs and cheerio

I want the sources array of the configuration object passed to jwplayer("vplayer").setup using cheerio or some other module.

<HTML>
<HEAD>
    <link rel="stylesheet" type="text/css" href="http://thevideos.tv/css/main.css">
    <script language="JavaScript" type="text/javascript" CHARSET="UTF-8"
            src="http://thevideos.tv/js/jquery.min.js"></script>
</HEAD>
<BODY topmargin=0 leftmargin=0 style="background:transparent;">

<table cellpadding=0 cellspacing=0>
    <tr>
        <td valign=top>
            <div style="position:relative;width:728px;height:410px;">
                <div id="play_limit_box">
                    <a href="http://thevideos.tv/premium.html" target="_blank">Upgrade you account</a> to watch videos
                    with no limits!
                </div>

                <span id='vplayer'><img src="http://192.99.62.187/i/01/00077/u0mqgq67qz76.jpg"
                                        style="width:728px;height:410px;"></span>    
            </div>
        </td>
    </tr>
</table>


<script type='text/javascript'>    jwplayer("vplayer").setup({
    sources: [{
        file: "http://192.99.62.187/kj2vyrxjey6vtaw52apz4kuggj6xfcc27pjizr5rhnrcgv73id7wwhzxlqda/v.mp4",
        label: "240p"
    }, {
        file: "http://192.99.62.187/kj2vyrxjey6vtaw52apz4kuggj6xfcc27pjizr5rhfbsgv73id76twjcd2ha/v.mp4",
        label: "360p"
    }]
});
</script>

<script>
    var sid = 90446;
    var wid = 115535;
</script>

</BODY>
</HTML>

Can it be done using cheerio? If not what do I have to use and how?

Thanks in advance :)

like image 898
Shafayat Alam Avatar asked Mar 12 '23 11:03

Shafayat Alam


1 Answers

You can use cheerio to retrieve the contents of the script tag, but you'd have to parse the content yourself. This should work for you, assuming the relevant script tag is always served the way you described:

$ = cheerio.load(html);

var textNode = $('body > script').map((i, x) => x.children[0])
                                 .filter((i, x) => x && x.data.match(/jwplayer/)).get(0);

if (textNode){
    var scriptText = textNode.data.replace(/\r?\n|\r/g, "")
                                  .replace(/file:/g, '"file":')
                                  .replace(/label:/g, '"label":');
    var jsonString = /sources:(.*)}\);/.exec(scriptText)[1];
    var sources    = JSON.parse(jsonString);
}
like image 142
cviejo Avatar answered Mar 20 '23 15:03

cviejo