Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use xPath or Regex?

Tags:

regex

xpath

The two methods below each serve the same purpose: scan the content of the post and determine if at least one img tag has an alt attribute which contains the "keyword" which is being tested for.

I'm new to xPath and would prefer to use it depending on how expensive that approach is compared to the regex version...

Method #1 uses preg_match

function image_alt_text_has_keyword($post)
        {
            $theKeyword = trim(wpe_getKeyword($post));
            $theContent = $post->post_content;
            $myArrayVar = array();
            preg_match_all('/<img\s[^>]*alt=\"([^\"]*)\"[^>]*>/siU',$theContent,$myArrayVar);
            foreach ($myArrayVar[1] as $theValue)
            {
                if (keyword_in_content($theKeyword,$theValue)) return true;
            }
            return false;
        }

function keyword_in_content($theKeyword, $theContent)
        {
            return preg_match('/\b' . $theKeyword . '\b/i', $theContent);
        }

Method #2 uses xPath

function keyword_in_img_alt()
{
global $post;
$keyword = trim(strtolower(wpe_getKeyword($post)));
$dom = new DOMDocument;
$dom->loadHTML(strtolower($post->post_content));
$xPath = new DOMXPath($dom);
return $xPath->evaluate('count(//a[.//img[contains(@alt, "'.$keyword.'")]])');
}
like image 605
Scott B Avatar asked Oct 30 '10 17:10

Scott B


People also ask

Does XPath use regex?

XPath regex is help us using locate the part of an attribute that stays consistent for identifying the element of the web in a web page. Sometimes value from the attribute of html code is changed, the attribute of the instance is changing every time and the web page which we are working on is refreshed every time.

What is selenium regex?

In Selenese, regular expression patterns allow a user to perform many tasks that would be very difficult otherwise. For example, suppose a test needed to ensure that a particular table cell contained nothing but a number. regexp: [0-9]+ is a simple pattern that will match a decimal number of any length.

Why * is used in XPath?

The '*' is used for selecting all the element nodes descending from the current node with @id-attribute-value equal to 'Passwd'.


2 Answers

If you are parsing XML you should use XPath as it was designed exactly for this purpose. XML / XHTML is not a regular language and cannot be parsed correctly by regular expressions. You may be able to write a regular expression which works some of the time but there will be special cases where it will fail.

like image 135
Mark Byers Avatar answered Oct 11 '22 15:10

Mark Byers


Using RegEx for selecting nodes in an XML document is as appropriate as using it for finding if a given number is a prime.

The fact that this is possible doesn't make it even a bit appropriate.

What is more, XPath 2.0 has RegEx support while RegEx do not have XPath support. Therefore, if both are needed, it is probably best to use XPath 2.0

like image 27
Dimitre Novatchev Avatar answered Oct 11 '22 13:10

Dimitre Novatchev