Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to get page title

Tags:

regex

php

There are lots of answers to this question, but not a single complete one:

With using one regular expression, how do you extract page title from <title>Page title</title>?

There are several other cases how title tags are typed, such as:

<TITLE>Page title</TITLE>

<title>
 Page title</title>
<title>
 Page title
</title>

<title lang="en-US">Page title</title>

...or any combination of above.

And it can be on its own line or in between other tags:

<head>
  <title>Page title</title>
</head>

<head><title>Page title</title></head>

Thanks for help in advance.

UDPATE: So, the regex approach might not be the best solution to this. Which PHP based HTML parser could handle all scenarios, where HTML is well formed (or not so well)?

UPDATE 2: sp00m's regex (https://stackoverflow.com/a/13510307/1844607) seems to be working in all cases. I'll get back to this if needed.

like image 687
Jari Avatar asked Nov 22 '12 10:11

Jari


1 Answers

Use a HTML parser instead. But in case of:

<title[^>]*>(.*?)</title>

Demo

like image 71
sp00m Avatar answered Oct 09 '22 00:10

sp00m