I have a text files with strings and for each string I need to divide and capture each part of it.
The string is like:
Joao.Martins.G2R71.Pedro.Feliz.sno
Being: NAME 1st player (only first or first+surname) G = game (can be 2 or 02 or other number less than 99) ; R = result (in this example home team wis 7x1) and NAME 2nd player ... last 3 chars are the game type (this example snooker)
But the string can also be:
Joao Martins |2x71| Pedro Feliz.poo
I'm no Regex expert (sadly) and already searched lots of questions here without finding a solution or for that matter even getting help just by reading the answers to other questions (mainly because I never seem to understand this)
I already have this:
preg_match("/\[(|^|]+)\]/",$string,$result);
echo $result[1] . "<br />";
But this only gives me the all thingy between the | | part without even separating them and ignores everything else
Can you guys help me with a solution for both cases? I'm as usual completely lost here!
Thanks in advance!
explode
way:You don't have to use complex regexp, you may use simple explode
.
$parts = explode( '.', $string);
Parts now how either 2 parts or 6, so you can do:
if( count( $parts) == 6)){
list( $fistName1, $surName1, $string, $fistName2, $surName2, $gameType) = $parts;
} elseif( count( $parts) == 2) {
$gameType = $parts[1];
list( $fistName1, $surName1, $string, $fistName2, $surName2) = explode( $parts[0]);
} else {
echo "Cannot parse";
}
And now parsing $gameType
:)
if( preg_match( '~^\|(\d+)x(\d+)\|$~', $gameType, $parts)){
$first = $parts[1];
$second = $parts[2];
} elseif( preg_match( '~^G(\d+)R(\d+)$~', $gameType, $parts)){
$first = $parts[1];
$second = $parts[2];
} else {
echo "Cannot parse!";
}
preg_match
way:The second regexp is intentionally different, so you can see how to write regexp that will "eat" whole name doesn't matter whether it has 2,3 or 5 parts and you will get used to *?
(greedy killer).
$match = array();
if( preg_match( '~^(\w+)\.(\w+)\.G(\d+)R(\d+)\.(\w+)\.(\w+)\.(\w+)$~', $text, $match)){
// First way
} elseif (preg_match( '~^([^\|]+)\|(\d+)x(\d+)\|(.*?)\.(\w+)$~', $text, $match)){
// Second way
} else {
// Failed to parse
}
And if player may have more than 2 names (like Armin Van Buuren
) you should go with regexp like this:
~^([\w.]+)\.G(\d+)R(\d+)\.([\w.]+)\.(\w+)$~
This will match names in Albert.Einstein
, Armin.Van.Buuren
(regexp relies on that name won't contain \d
(decimal number) so names like Gerold The 3rd
won't match).
You should be fine with using just: ~^([\w\d.]+)\.G(\d+)R(\d+)\.([\w\d.]+)\.(\w+)$~
which would also match Gerold The 3rd
and any other name (\.G(\d+)R(\d+)\.
is quite strict and you would have to make up really crazy name like G3R01
(like "3l1t33 kid Gerold") to parse it wrong.
Oh and one more thing, don't forget to $name = strtr( $name, '.', ' ')
:)
~~
- regexp delimiter; starts end finishes regexp; ~regexp~
, it can be practically anything /regexp/
, (regexp)
^
and $
- meta characters;^
start of string/line, $
end of string/line\w
is escape sequence for any word character, the same as [a-zA-Z]
([\w.]+)
- captures subpatern/match group what contains [a-zA-Z.]
at least once. +
is called quantifier
+?
- ?
(after other quantifier) is called greedy killer and it means take as little as possible, normally would (\w+)a
would match (on string ababa
) abab
, (\w+?)a
would match ab
and (\w*?)a
would match empty string :)If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With