Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Surround Hebrew and English text in div

I am trying to add a span tag around Hebrew and English sentence in a paragraph. E.g. "so היי all whats up אתכם?" will become :

[span]so[/span][span]היי[/span][span]all whats up[/span][span]אתכם[/span]

I have been trying with regexp but its just removing the Hebrew words and joining the English words in one span.

var str = 'so היי all whats up אתכם?'
var match= str.match(/(\b[a-z]+\b)/ig);
var replace = match.join().replace(match.join(),'<span>'+match.join()+'</span>')
like image 224
roude Avatar asked Jul 03 '15 08:07

roude


2 Answers

Previous answers here did not account for the whole word requirement. Indeed, it is difficult to achieve this since \b word boundary does not support word boundaries with neighboring Hebrew Unicode symbols that we can only match with a character class using \u notation.

I suggest using look-aheads and capturing groups to make sure we capture the whole Hebrew word ((^|[^\u0590-\u05FF])([\u0590-\u05FF]+)(?![\u0590-\u05FF]) that makes sure there is a non-Hebrew symbol or start of string before a Hebrew word - add a \s if there are spaces between the Hebrew words!), and \b[a-z\s]+\b to match sequence of whole English words separated with spaces.

If you plan to insert the <span> tags into a sentence around whole words, here is a function that may help:

var str = 'so היי all whats up אתכם?';
//var str = 'so, היי, all whats up אתכם?';
var result = str.replace(/\s*(\b[a-z\s]+\b)\s*/ig, '<span>$1</span>');
result = result.replace(/(^|[^\u0590-\u05FF])([\u0590-\u05FF]+)(?![\u0590-\u05FF])/g, '$1<span>$2</span>');
document.getElementById("r").innerHTML = result;
span {
    background:#FFCCCC;
    border:1px solid #0000FF;
}
<div width="645" id="r"/>

Result:

<span>so</span><span>היי</span><span>all whats up</span><span>אתכם</span>?

If you do not need any punctuation or alphanumeric entities in your output, just concatenated whole English and Hebrew words, then use

var str = 'היי, User234, so 222היי all whats up אתכם?';
var re = /(^|[^\u0590-\u05FF])([\u0590-\u05FF]+)(?![\u0590-\u05FF])|(\b[a-z\s]+\b)/ig;
var res = [];
while ((m = re.exec(str)) !== null) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }
  if (m[1] !== undefined) {
      res.push('<span>'+m[2].trim()+'</span>');
    }
  else
    {
      res.push('<span>'+m[3].trim()+'</span>');
    }
  
}
document.getElementById("r").innerHTML = res.join("");
span {
    background:#FFCCCC;
    border:1px solid #0000FF;
}
<div width="645" id="r"/>

Result:

<span>היי</span><span>so</span><span>היי</span><span>all whats up</span><span>אתכם</span>
like image 82
Wiktor Stribiżew Avatar answered Nov 08 '22 10:11

Wiktor Stribiżew


I think the Regex you want is something like [^a-z^\u0591-\u05F4^\s]. I'm not entirely sure how you want to handle spaces.

My solution

Copy str to a new var res, replacing any characters that aren't A-Z / Hebrew.
Loop over any english (a-z) characters in str and wrap them in a span, using res.replace.
Do the same again for the Hebrew characters.

It's not quite 100%, but seems to work well enough IMO.

var str = 'so היי all whats up אתכם?';
var finalStr = str.replace(/([^a-z^\u0591-\u05F4^\s])/gi, '');

var rgx = /([a-z ]+)/gi;
var mat = str.match(rgx);

for(var i=0; i < mat.length; ++i){
    var match = mat[i];
    finalStr = finalStr.replace(match.trim(),'<span>'+match.trim()+'</span>');
}

rgx = /([\u0591-\u05F4 ]+)/gi;
var mat = str.match(rgx);

for(var i=0; i < mat.length; ++i){
    var match = mat[i];
    finalStr = finalStr.replace(match.trim(),'<span>'+match.trim()+'</span>');
}

document.getElementById('res').innerHTML = finalStr;

http://jsfiddle.net/daveSalomon/0ns6nuxy/1/

like image 41
Dave Salomon Avatar answered Nov 08 '22 12:11

Dave Salomon