Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to retrieve a <title> in PHP

Tags:

html

php

parsing

People also ask

How to get page title using php?

PHP Code to Get Webpage Title from URL: php // function to get webpage title function getTitle($url) { $page = file_get_contents($url); $title = preg_match('/<title[^>]*>(. *?) <\/title>/ims', $page, $match) ? $match[1] : null; return $title; } // get web page title echo 'Title: ' .

How to get title of page?

Again, on Windows, you can select Ctrl + F and then type “title” to quickly find the Title. That's all there is to it. Now you can easily find the webpage Title for any page on your website.


<?php
    function page_title($url) {
        $fp = file_get_contents($url);
        if (!$fp) 
            return null;

        $res = preg_match("/<title>(.*)<\/title>/siU", $fp, $title_matches);
        if (!$res) 
            return null; 

        // Clean up title: remove EOL's and excessive whitespace.
        $title = preg_replace('/\s+/', ' ', $title_matches[1]);
        $title = trim($title);
        return $title;
    }
?>

Gave 'er a whirl on the following input:

print page_title("http://www.google.com/");

Outputted: Google

Hopefully general enough for your usage. If you need something more powerful, it might not hurt to invest a bit of time into researching HTML parsers.

EDIT: Added a bit of error checking. Kind of rushed the first version out, sorry.


You can get it without reg expressions:

$title = '';
$dom = new DOMDocument();

if($dom->loadHTMLFile($urlpage)) {
    $list = $dom->getElementsByTagName("title");
    if ($list->length > 0) {
        $title = $list->item(0)->textContent;
    }
}

or making this simple function slightly more bullet proof:

function page_title($url) {

    $page = file_get_contents($url);

    if (!$page) return null;

    $matches = array();

    if (preg_match('/<title>(.*?)<\/title>/', $page, $matches)) {
        return $matches[1];
    } else {
        return null;
    }
}


echo page_title('http://google.com');

I'm also doing a bookmarking system and found that since PHP 5 you can use stream_get_line to load the remote page only until the closing title tag (instead of loading the whole file), then get rid of what's before the opening title tag with explode (instead of a regex).

function page_title($url) {
  $title = false;
  if ($handle = fopen($url, "r"))  {
    $string = stream_get_line($handle, 0, "</title>");
    fclose($handle);
    $string = (explode("<title", $string))[1];
    if (!empty($string)) {
      $title = trim((explode(">", $string))[1]);
    }
  }
  return $title;
}

Last explode thanks to PlugTrade's answer who reminded me that title tags can have attributes.


Regex?

Use cURL to get the $htmlSource variable's contents.

preg_match('/<title>(.*)<\/title>/iU', $htmlSource, $titleMatches);

print_r($titleMatches);

see what you have in that array.

Most people say for HTML traversing though you should use a parser as regexs can be unreliable.

The other answers provide more detail :)