Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

str_replace (or preg_replace?) accepting and keeping accented characters and caps

Tags:

php

mysql

I'm in the process of building a French MySQL database for a website which will be containing accented characters and caps at some places. All of this works perfectly.

Now I designed a table showing the content of the database (working perfectly) and I put a search bar on top of it. The SQL Query for the search works as intended (using LIKE, it's case insensitive and treats accented characters like their base letter, which is surprisingly exactly what I want).

Here's my problem: I'd like to highlight all instances of the search directly in the table. I got it partly working with this:

str_ireplace($_POST["search"], 
             '<span class="highlight">' . $_POST["search"] . "</span>",
             $row['First_Name']);

but these problems occur:

  • It changes the caps in my table based on search input
  • If the user ignores accented characters (searching for "ecole" while looking for "école"), the search doesn't work

I've been looking for a solution for the past 3 hours without any luck. I started reading about preg_replace() but can't seem to find the right way of doing it without writing endless code for each possible accented character by hand. It would be great if I could mimic what the SQL Query does with "LIKE" but in php or something.

like image 768
Fierceblood Avatar asked Oct 20 '22 22:10

Fierceblood


2 Answers

The way you do it, you will always display what the user has input, and indeed PHP will not make the same lenient comparison as a MySQL LIKE.

Here is a function I wrote that deals with this problem, including most of the French accented characters.

function highlight_substring( $string, $substring )
{
  if( empty( $string ) || empty( $substring ) ) return false;

  $normal = array( 'à', 'é', 'è', 'ê', 'ë', 'î', 'ï', 'ô', 'ò', 'ö', 'û', 'ü', 'ù', 'ç' );
  $flat = array( 'a', 'e', 'e', 'e', 'e', 'i', 'i', 'o', 'o', 'o', 'u', 'u', 'u', 'c' );

  $str = mb_strtolower( $string );
  $str = str_replace( $normal, $flat, $str );

  $sub = mb_strtolower( $substring );
  $sub = str_replace( $normal, $flat, $sub );

  $pos = mb_strpos( $str, $sub );

  if( $pos !== false )
  {
    $var = mb_substr( $string, 0, $pos ).'<span class="highlight">'.mb_substr( $string, $pos, mb_strlen( $substring ) ).'</span>';
    $var .= mb_substr( $string,( bcadd( mb_strlen( $substring ), $pos ) ) );
    $string = $var;
  }

  return $string;
}

Feel free to adapt and improve ;)

Usage

echo highlight_substring( 'Allons à l’école !', 'ecole' ); // user input 'ecole'
echo highlight_substring( 'Allons à l’École !', 'ecole' ); // user input 'ecole'
echo highlight_substring( 'Allons à l’école !', 'Ecole' ); // user input 'Ecole'

Will output:

Allons à l’<span class="highlight">école</span> !
Allons à l’<span class="highlight">École</span> !
Allons à l’<span class="highlight">école</span> !
like image 117
Sébastien Avatar answered Oct 23 '22 19:10

Sébastien


here is another variant for PHP 5.3+, it has 1 problem - it actually removes acutes, maybe it will work as partial solution

mb_regex_encoding('utf-8');
mb_internal_encoding('utf-8');

$row = array('First_Name' => 'some École text with école ecole end of some text ');

function highlightString($string, $word)
{
    $string = iconv('utf-8', 'ISO-8859-1//IGNORE', Normalizer::normalize($string, Normalizer::FORM_D));
    $word = iconv('utf-8', 'ISO-8859-1//IGNORE', Normalizer::normalize($word, Normalizer::FORM_D));
    return mb_ereg_replace_callback('('.$word.')', function ($m) { return '<span class=\"highlight\">'.$m[0].'</span>';}, $string, 'msri'); // it is not very secure to use data from POST directly
}

echo highlightString($row['First_Name'], 'école') . " <br>\n";
echo highlightString($row['First_Name'], 'ecole'). " <br>\n";
like image 20
Iłya Bursov Avatar answered Oct 23 '22 18:10

Iłya Bursov