Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting url data by curl method giving unexpected results in symbols

Tags:

php

curl

I am facing some times Problem in getting url data by curl method specially website data is is in other language like arabic etc My curl function is

function file_get_contents_curl($url)
{
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

    $data = curl_exec($ch);
    $info = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);

    //checking mime types
    if(strstr($info,'text/html')) {
        curl_close($ch);
        return $data;
    } else {
        return false;
    }
}

And how i am getting data

$html =  file_get_contents_curl($checkurl);
    $grid ='';
    if($html)
    {
        $doc = new DOMDocument();
        @$doc->loadHTML($html);
        $nodes = $doc->getElementsByTagName('title');
        @$title = $nodes->item(0)->nodeValue;
        @$metas = $doc->getElementsByTagName('meta');
        for ($i = 0; $i < $metas->length; $i++)
        {
            $meta = $metas->item($i);
            if($meta->getAttribute('name') == 'description')
                $description = $meta->getAttribute('content');
        }

I am getting all data correctly from some arabic websites like http://www.emaratalyoum.com/multimedia/videos/2012-04-08-1.474873 and when i give this youtube url http://www.youtube.com/watch?v=Eyxljw31TtU&feature=g-logo&context=G2c4f841FOAAAAAAAFAA
it shows symbols.. what setting i have to do to show exactly the same title description.

like image 832
Sohail Anwar Avatar asked Apr 12 '12 05:04

Sohail Anwar


2 Answers

Introduction

Getting Arabic can be very tricky but they are some basic steps you need to ensure

  • Your document must output UTF-8
  • Your DOMDocument must read in UTF-8 fromat

Problem

When getting Youtube information its already given the information in "UTF-8" format and the retrieval process adds addition UTF-8 encoding .... not sure why this occurs but a simple utf8_decode would fix the issue

Example

header('Content-Type: text/html; charset=UTF-8');
echo displayMeta("http://www.emaratalyoum.com/multimedia/videos/2012-04-08-1.474873");
echo displayMeta("http://www.youtube.com/watch?v=Eyxljw31TtU&feature=g-logo&context=G2c4f841FOAAAAAAAFAA"); 

Output

emaratalyoum.com

التقطت عدسات الكاميرا حارس مرمى ريال مدريد إيكر كاسياس في موقف محرج قبل لحظات من بداية مباراة النادي الملكي مع أبويل القبرصي في ذهاب دور الثمانية لدوري أبطال 

youtube.com

أوروبا.ففي النفق المؤدي إلى الملعب، قام كاسياس بوضع إصبعه في أنفه، وبعدها قام بمسح يده في وجه أحدبنات سعوديات: أريد "شايب يدللني ولا شاب يعللني"

Function Used

displayMeta

function displayMeta($checkurl) {
    $html = file_get_contents_curl($checkurl);
    $grid = '';
    if ($html) {
        $doc = new DOMDocument("1.0","UTF-8");
        @$doc->loadHTML($html);
        $nodes = $doc->getElementsByTagName('title');
        $title = $nodes->item(0)->nodeValue;
        $metas = $doc->getElementsByTagName('meta');
        for($i = 0; $i < $metas->length; $i ++) {
            $meta = $metas->item($i);
            if ($meta->getAttribute('name') == 'description') {
                $description = $meta->getAttribute('content');
                if (stripos(parse_url($checkurl, PHP_URL_HOST), "youtube") !== false)
                    return utf8_decode($description);
                else {
                    return $description;
                }
            }
        }
    }
}

*file_get_contents_curl*

function file_get_contents_curl($url) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

    $data = curl_exec($ch);
    $info = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);

    // checking mime types
    if (strstr($info, 'text/html')) {
        curl_close($ch);
        return $data;
    } else {
        return false;
    }
}
like image 182
Baba Avatar answered Sep 30 '22 21:09

Baba


I believe this will work... utf8_decode() your attribute..

function file_get_contents_curl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

$data = curl_exec($ch);
$info = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);

//checking mime types
if(strstr($info,'text/html')) {
    curl_close($ch);
    return $data;
} else {
    return false;
}
}

$html =  file_get_contents_curl($checkurl);
$grid ='';
if($html)
{
    $doc = new DOMDocument();
    @$doc->loadHTML($html);
    $nodes = $doc->getElementsByTagName('title');
    @$title = $nodes->item(0)->nodeValue;
    @$metas = $doc->getElementsByTagName('meta');
    for ($i = 0; $i < $metas->length; $i++)
    {
        $meta = $metas->item($i);
        if($meta->getAttribute('name') == 'description')
            $description = utf8_decode($meta->getAttribute('content'));
    }
like image 26
Dinesh Avatar answered Sep 30 '22 20:09

Dinesh