Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

preg_match to capture string part after a special character

I have a text files with strings and for each string I need to divide and capture each part of it.

The string is like:

Joao.Martins.G2R71.Pedro.Feliz.sno

Being: NAME 1st player (only first or first+surname) G = game (can be 2 or 02 or other number less than 99) ; R = result (in this example home team wis 7x1) and NAME 2nd player ... last 3 chars are the game type (this example snooker)

But the string can also be:

Joao Martins |2x71| Pedro Feliz.poo

I'm no Regex expert (sadly) and already searched lots of questions here without finding a solution or for that matter even getting help just by reading the answers to other questions (mainly because I never seem to understand this)

I already have this:

preg_match("/\[(|^|]+)\]/",$string,$result);
echo $result[1] . "<br />";

But this only gives me the all thingy between the | | part without even separating them and ignores everything else

Can you guys help me with a solution for both cases? I'm as usual completely lost here!

Thanks in advance!

like image 587
Afonso Gomes Avatar asked Dec 22 '22 02:12

Afonso Gomes


1 Answers

explode way:

You don't have to use complex regexp, you may use simple explode.

$parts = explode( '.', $string);

Parts now how either 2 parts or 6, so you can do:

if( count( $parts) == 6)){
   list( $fistName1, $surName1, $string, $fistName2, $surName2, $gameType) = $parts;
} elseif( count( $parts) == 2) {
   $gameType = $parts[1];
   list( $fistName1, $surName1, $string, $fistName2, $surName2) = explode( $parts[0]);
} else {
   echo "Cannot parse";
}

And now parsing $gameType :)

if( preg_match( '~^\|(\d+)x(\d+)\|$~', $gameType, $parts)){
   $first = $parts[1];
   $second = $parts[2];
} elseif( preg_match( '~^G(\d+)R(\d+)$~', $gameType, $parts)){
   $first = $parts[1];
   $second = $parts[2];
} else {
   echo "Cannot parse!";
}

preg_match way:

The second regexp is intentionally different, so you can see how to write regexp that will "eat" whole name doesn't matter whether it has 2,3 or 5 parts and you will get used to *? (greedy killer).

$match = array();
if( preg_match( '~^(\w+)\.(\w+)\.G(\d+)R(\d+)\.(\w+)\.(\w+)\.(\w+)$~', $text, $match)){
  // First way
} elseif (preg_match( '~^([^\|]+)\|(\d+)x(\d+)\|(.*?)\.(\w+)$~', $text, $match)){
  // Second way
} else {
  // Failed to parse
}

Edit (more than 2 names)

And if player may have more than 2 names (like Armin Van Buuren) you should go with regexp like this:

~^([\w.]+)\.G(\d+)R(\d+)\.([\w.]+)\.(\w+)$~

This will match names in Albert.Einstein, Armin.Van.Buuren (regexp relies on that name won't contain \d (decimal number) so names like Gerold The 3rd won't match).

You should be fine with using just: ~^([\w\d.]+)\.G(\d+)R(\d+)\.([\w\d.]+)\.(\w+)$~ which would also match Gerold The 3rd and any other name (\.G(\d+)R(\d+)\. is quite strict and you would have to make up really crazy name like G3R01 (like "3l1t33 kid Gerold") to parse it wrong.

Oh and one more thing, don't forget to $name = strtr( $name, '.', ' ') :)

RegExp explained

  • ~~ - regexp delimiter; starts end finishes regexp; ~regexp~, it can be practically anything /regexp/, (regexp)
  • ^ and $ - meta characters;^ start of string/line, $ end of string/line
  • \w is escape sequence for any word character, the same as [a-zA-Z]
  • ([\w.]+) - captures subpatern/match group what contains [a-zA-Z.] at least once. + is called quantifier
  • +? - ? (after other quantifier) is called greedy killer and it means take as little as possible, normally would (\w+)a would match (on string ababa) abab, (\w+?)a would match ab and (\w*?)a would match empty string :)
like image 134
Vyktor Avatar answered Dec 24 '22 02:12

Vyktor