Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Separate a string (Arabic string)

Tags:

string

regex

php

I have a combined string that I want to separate.

My Pattern: (Arabic language, Starts from the right ):

str3[str2](str1)

Example 1

For the input:

string = تَ) [ ع . ] (مص م .) راست کردن ، معتدل کردن)

I want the output:

$str1='(تَ)';
$str2='[ ع . ]';
$str3='مص م .) راست کردن ، معتدل کردن)';

Example 2

For the input:

string = اِ تَ) (مص ل .) = اباته : شب را در جایی گذراندن)

I want the output:

$str1='(اِ تَ)';
$str2='';
$str3='مص ل .) = اباته : شب را در جایی گذراندن)';

Example 3

For the input:

string = [ ع . ] (مص م .) راست کردن ، معتدل کردن

I want the output:

$str1='';
$str2='[ ع . ]';
$str3='(مص م .) راست کردن ، معتدل کردن';

How can I do that?

like image 965
Shafizadeh Avatar asked May 24 '15 14:05

Shafizadeh


People also ask

How can I split a string?

The split() method splits a string into an array of substrings. The split() method returns the new array. The split() method does not change the original string. If (" ") is used as separator, the string is split between words.

How do you split a string into characters?

Split(Char, Int32, StringSplitOptions) Splits a string into a maximum number of substrings based on the provided character separator, optionally omitting empty substrings from the result.


1 Answers

As I have mentioned in the comments, the apparently first character (the rightmost) is not the open parenthesis as it supposed to be (in fact it's the last character), and this hidden error causes misunderstandings (it's just visually correct). However, the following code corrects the error and outputs the desired strings.

<?php
$arrStr = [
'تَ) [ ع . ] (مص م .) راست کردن ، معتدل کردن)',
'اِ تَ) (مص ل .) = اباته : شب را در جایی گذراندن)',
];
echo "<body style='direction: rtl !important;'>";
foreach($arrStr as $str) {
    preg_match('~(.*?\))(?:\s)(\[.*?\])?(?:\s*?)(.*)~', $str, $matches);
    $matches[1] = "(".$matches[1];
    $matches[3] = trim(substr($matches[3], 0, -1));
    echo "<pre>";
    for($i=1; $i<=3; $i++)
        echo "$i: {$matches[$i]}<br />";
    echo "</pre><hr>";
}
echo "</body>";
?>

The output: (Please note that the entries are in the correct RTL direction and will be displayed correctly on a RTL environment (they don't act falsify as being correct on a LTR environment.))

1: (تَ)
2: [ ع . ]
3: (مص م .) راست کردن ، معتدل کردن
_____________________________________________
1: (اِ تَ)
2: 
3: (مص ل .) = اباته : شب را در جایی گذراندن
_____________________________________________


P.S: So, here is your new scenario: The first part enclosed in () is optional, the second part enclosed in [] is also optional, but the third part is mandatory; According to your examples above, the third part may also start with a (*), Due to this, and considering the example of B (A) there is NO way to determine whether the example is in a format which has the optional first part (A) followed by the mandatory 3rd part B, or is in a format which doesn't have any of the optional parts but has the mandatory 3rd part being the whole string, if that's not a concern you may use the ~(.*?\)\s)?(\[.*?\]\s)?(.*)~ as the regular expression.
like image 61
someOne Avatar answered Sep 26 '22 00:09

someOne