Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Javascript: RegExp Turkish Character Issue

I'm working about javascript search function. RegExp is '\\b('+word+')\\b', 'ig' for selecting and getting position setted word but it just selects English characters no select Turkish characters or anything.

Working script: https://jsfiddle.net/kv4jftcz/2/

Don't working script: https://jsfiddle.net/kv4jftcz/3/

like image 671
ATES Avatar asked Apr 05 '16 16:04

ATES


2 Answers

You need use RegExp with u tag but javascript won't support unicode regex :( so to solve this problem you should redefine \b. \ba means [^\w]a so for turkish characters;

[^\wığüşöçĞÜŞÖÇİ] is key to go.

[^\wığüşöçĞÜŞÖÇİ](türkçe)[^\wığüşöçĞÜŞÖÇİ]

can be used but this time it won't find any türkçe in down below.

türkçe dili destekliyorum

to solve that problem you can add ^ and $..

(?:^|[^\wığüşöçĞÜŞÖÇİ])(türkçe)(?:[^\wığüşöçĞÜŞÖÇİ]|$)

thats it..

Note: this regex will match previous character and next character. so you need to put them again. (^|[^\wığüşöçĞÜŞÖÇİ])(türkçe)([^\wığüşöçĞÜŞÖÇİ]|$) and replace with $1<span class="match">$2</span>$3.

Note: you can use look behind and look ahead too but unfortunately javascript doesn't support look behind

	var word = 'İpsum';
	var rgx = new RegExp('(^|[^\wığüşöçĞÜŞÖÇİ])(' + word + ')([^\wığüşöçĞÜŞÖÇİ]|$)', 'ig');

	$('p, p *').contents().filter(function() {
	  return this.nodeType === 3;
	}).each(function() {
	  $(this).replaceWith($(this).text().replace(rgx, "$1<span class='match'>$2</span>$3"));
	});

	var positions = $('.match').map(function() {
	  return this.getBoundingClientRect().top;
	}).get();
div {
  font-size: 50px;
}
span.match {
  background: gold;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

<body>
  <p>Lorem İpsum dolor sit amet, consectetur adipisicing elit. Aut voluptatum, provident saepe. Culpa animi sint, itaque iure error hic qui blanditiis perspiciatis adipisci, libero quia veritatis dignissimos quasi id cumque!</p>
</body>

Note: You can't search the special characters with this (like [hi] spe.cial characters). You must use this

like image 172
co3moz Avatar answered Nov 18 '22 20:11

co3moz


Regular Expressions in Javascript do not support Unicode out of the box, which is required for the Turkish characters that you are using (although ES6 may change this).

Additionally, boundary rules (such as the \b tags that you are using within your expression) are generally not going to be supported for non-ASCII characters, so that could attribute to this issue as well. If you remove the boundary tags, the following example using RegexPal appears to work as expected :

enter image description here

You could potentially use a plug-in like XRegExp to add some support for handling Unicode characters as well.

A better alternative still might be the UnicodeJS library, which appears to add this missing functionality, might be worth trying.

like image 1
Rion Williams Avatar answered Nov 18 '22 18:11

Rion Williams