Using regex to get title

Question

I'm not sure how I'd select an title with regex. I've tried

match(/<title>(.*) .*<\/title>/)[1]

but that doesn't match anything.

This is the response body I'm trying to select from.

Trying to select "title I need to select."

ndnenkov · Accepted Answer

The reason it doesn't work is because of the itemprop=\"name\" property. To fix this, you can match it as well:

# copy-paste from the page you provided
html = '<!doctype html>
<html lang=\"en\" itemscope itemtype=\"https://schema.org/WebPage\">
<head>
<meta charset=\"utf-8\"><meta name=\"referrer\" content=\"always\" />
<title itemprop=\"name\">title I need to select.</title>
<meta itemprop=\"description\" name=\"description\" content=\'

html.match(/<title.*?>(.*)</title>/)[1] # => "title I need to select."

.*? basically means "match as many characters are needed, but not more"

However, as other have pointed out, regexes are not ideal for html parsing. Instead, you could use a popular ruby gem for that purpose - Nokogiri:

require 'nokogiri'

page = Nokogiri.parse(html)
page.css('title').text # => "title I need to select."

Note that it can handle even malformed html like is the case here.

jeremy04 · Answer

If you're looking for a much more robust XML/HTML parser, try using Nokogiri which supports XPath.

This post explains why Use xPath or Regex?

require "nokogiri"
string = "<title itemprop=\"name\">title I need to select.</title>"
html_doc = Nokogiri::HTML(string)
html_doc.xpath("//title").first.text

Using regex to get title

Tags:

regex

ruby

ruby-on-rails

match

user3579614

2 Answers

ndnenkov

jeremy04

Recent Activity

Donate For Us

Using regex to get title

Tags:

regex

ruby

ruby-on-rails

match

user3579614

2 Answers

ndnenkov

jeremy04

Related questions

Recent Activity

Donate For Us