Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression - any text to URL friendly one

PHP regular expression script to remove anything that is not a alphabetical letter or number 0 to 9 and replace space to a hyphen - change to lowercase make sure there is only one hyphen - between words no -- or --- etc.

For example:

Example: The quick brown fox jumped Result: the-quick-brown-fox-jumped

Example: The quick brown fox jumped! Result: the-quick-brown-fox-jumped

Example: The quick brown fox - jumped! Result: the-quick-brown-fox-jumped

Example: The quick ~`!@#$%^ &*()_+= ------- brown {}|][ :"'; <>?.,/ fox - jumped! Result: the-quick-brown-fox-jumped

Example: The quick 1234567890 ~`!@#$%^ &*()_+= ------- brown {}|][ :"'; <>?.,/ fox - jumped! Result: the-quick-1234567890-brown-fox-jumped


Anybody have idea for the regular expression?

Thanks!

like image 300
Paul Avatar asked Oct 29 '10 12:10

Paul


People also ask

What is a good regex to match a URL?

@:%_\+~#= , to match the domain/sub domain name.

Can we use regex in URL?

URL regular expressions can be used to verify if a string has a valid URL format as well as to extract an URL from a string.

How do I check if a URL is valid in regex?

Match the given URL with the regular expression. In Java, this can be done by using Pattern. matcher(). Return true if the URL matches with the given regular expression, else return false.

How do I validate a string URL?

You can use the URLConstructor to check if a string is a valid URL. URLConstructor ( new URL(url) ) returns a newly created URL object defined by the URL parameters. A JavaScript TypeError exception is thrown if the given URL is not valid.


2 Answers

Since you seem to want all sequences of non-alphanumeric characters being replaced by a single hyphen, you can use this:

$str = preg_replace('/[^a-zA-Z0-9]+/', '-', $str);

But this can result in leading or trailing hyphens that can be removed with trim:

$str = trim($str, '-');

And to convert the result into lowercase, use strtolower:

$str = strtolower($str);

So all together:

$str = strtolower($str);
$str = trim($str, '-');
$str = preg_replace('/[^a-z0-9]+/', '-', $str);

Or in a compact one-liner:

$str = strtolower(trim(preg_replace('/[^a-zA-Z0-9]+/', '-', $str), '-'));
like image 186
Gumbo Avatar answered Sep 18 '22 06:09

Gumbo


I was just working with something similar, and I came up with this little piece of code, it also contemplates the use of latin characters.

This is the sample string:

$str = 'El veloz murciélago hindú comía fe<!>&@#$%&!"#%&?¡?*liz cardillo y kiwi. La cigüeña ¨^;.-|°¬tocaba el saxofón detrás del palenque de paja';

First I convert the string to htmlentities just to make it easier to use later.

$friendlyURL = htmlentities($str, ENT_COMPAT, "UTF-8", false);

Then I replace latin characters with their corresponding ascii characters (á becomes a, Ü becomes U, and so on):

$friendlyURL = preg_replace('/&([a-z]{1,2})(?:acute|circ|lig|grave|ring|tilde|uml|cedil|caron);/i','\1',$friendlyURL);

Then I convert the string back from html entities to symbols, again for easier use later.

$friendlyURL = html_entity_decode($friendlyURL,ENT_COMPAT, "UTF-8");

Next I replace all non alphanumeric characters into hyphens.

$friendlyURL = preg_replace('/[^a-z0-9-]+/i', '-', $friendlyURL);

I remove extra hyphens inside the string:

$friendlyURL = preg_replace('/-+/', '-', $friendlyURL);

I remove leading and trailing hyphens:

$friendlyURL = trim($friendlyURL, '-');

And finally convert all into lowercase:

$friendlyURL = strtolower($friendlyURL);

All together:

function friendlyUrl ($str = '') {

    $friendlyURL = htmlentities($str, ENT_COMPAT, "UTF-8", false); 
    $friendlyURL = preg_replace('/&([a-z]{1,2})(?:acute|circ|lig|grave|ring|tilde|uml|cedil|caron);/i','\1',$friendlyURL);
    $friendlyURL = html_entity_decode($friendlyURL,ENT_COMPAT, "UTF-8"); 
    $friendlyURL = preg_replace('/[^a-z0-9-]+/i', '-', $friendlyURL);
    $friendlyURL = preg_replace('/-+/', '-', $friendlyURL);
    $friendlyURL = trim($friendlyURL, '-');
    $friendlyURL = strtolower($friendlyURL);
    return $friendlyURL;

}

Test:

$str = 'El veloz murciélago hindú comía fe<!>&@#$%&!"#%&-?¡?*-liz cardillo y kiwi. La cigüeña ¨^`;.-|°¬tocaba el saxofón detrás del palenque de paja';

echo friendlyUrl($str);

Outcome:

el-veloz-murcielago-hindu-comia-fe-liz-cardillo-y-kiwi-la-ciguena-tocaba-el-saxofon-detras-del-palenque-de-paja

I guess Gumbo's answer fits your problem better, and it's a shorter code, but I thought it would be useful for others.

Cheers, Adriana

like image 24
Adri V. Avatar answered Sep 18 '22 06:09

Adri V.