I have a web site module which collects some tweets from twitter and splits them as words to put into a database. However, as the tweets usually have Turkish characters [ıöüğşçİÖÜĞŞÇ], my module cannot divide the words correctly.
For example, the phrase Aynı labda çalıştığım is split into Ayn, labda and alıştığım, but it should have been split into Aynı, labda and çalıştığım
Here's my code which does the job:
preg_match_all('/(\A|\b)[A-Z\Ç\Ö\Ş\İ\Ğ\Ü]?[a-z\ç\ö\ş\ı\ğ\ü]+(\Z|\b)/u', $text,$a);
What do you think is wrong here?
Important Note: I'm not stupid not to split text by the space character, I need exactly these characters to match. I don't want any numerical or special character such as [,.!@#$^&*123456780].
I need a regular expression that will split this kısa isimleri ile "Vic" ve "Wick" vardı.
into this:
kısa
isimleri
ile
Vic
ve
Wick
vardı
More examples:
We're @test would be
We
re
test
Föö bär, we're @test to0 ÅÄÖ - 123 ok? kthxbai? is split into this,
b
r
we
re
test
ok
kthxbai
but I want it to be:
Föö
bär
we
re
test
ÅÄÖ
ok
kthxbai
I would take a look at mb_split().
$str = 'We\'re @test Aynı labda çalıştığım';
var_dump(\mb_split('\s', $str));
Gives me:
array
0 => string 'We're' (length=5)
1 => string '@test' (length=5)
2 => string 'Aynı' (length=5)
3 => string 'labda' (length=5)
4 => string 'çalıştığım' (length=16)
This expression would give you the desired result (according to your examples):
/(?<!\pL|\pN)\pL+(?!\pL|\pN)/u
\pL matches any unicode letter. The lookarounds are needed to make sure it isn't followed or preceded by numbers, to completely exclude words containing any numbers.
Example:
$str = "Aynı, labda - çalıştığım? \"quote\". Föö bär, we're @test to0 ÅÄÖ - 123 ok? kthxbai?";
preg_match_all('/(?<!\pL|\pN)\pL+(?!\pL|\pN)/u', $str, $m);
print_r($m);
Output:
Array
(
[0] => Array
(
[0] => Aynı
[1] => labda
[2] => çalıştığım
[3] => quote
[4] => Föö
[5] => bär
[6] => we
[7] => re
[8] => test
[9] => ÅÄÖ
[10] => ok
[11] => kthxbai
)
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With