Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cut an UTF8 text in PHP

I get UTF8 text from a database, and I want to show only the first $len characters (finishing in a word). I've tried several options but the function still doesn't work because of special characters (á, é, í, ó, etc).

Thanks for the help!

function text_limit($text, $len, $end='...')
{ 

  mb_internal_encoding('UTF-8');
  if( (mb_strlen($text, 'UTF-8') > $len) ) { 

    $text = mb_substr($text, 0, $len, 'UTF-8');
    $text = mb_substr($text, 0, mb_strrpos($text," ", 'UTF-8'), 'UTF-8');

    ...
  }
}

Edit to add an example

If I truncate a text with 65 characters, it returns:

Un jardín de estilo neoclásico acorde con el …

If I change the special characters (í, á), then it returns:

Un jardin de estilo neoclasico acorde con el Palacio de …

I'm sure there is something strange with the encoding or the server, or php; but I can't figure it out! Thanks!

Final Solution

I'm using this UTF8 PHP library and everything works now...

like image 244
fesja Avatar asked Jul 20 '10 21:07

fesja


2 Answers

Ok, so this has been baffling me that you can't get this to work because it should work just fine. Finally I think I have come up with the reason that this is not working for you.

What I think is going on here is that your browser is displaying in the wrong encoding and you are outputting utf-8 characters.

you have a couple options. First if you are displaying any of this as part of an html page check your meta tags to see if they are setting the character encoding.. If so change it to this:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

next if you are just outputting this directly to the browser use the header function to set the character encoding like so:

header("Content-type: text/html; charset=utf-8");

an easy test:

<?php
    header("Content-type: text/html; charset=utf-8");
    $text = "áéíó";
    echo mb_substr($text, 0, 3, 'utf-8');
?>

without this your browser will default to another encoding and display the text impropperly. Hopefully this helps you fix this issue, if not I'll keep trying :)

like image 78
Kelly Copley Avatar answered Nov 11 '22 04:11

Kelly Copley


mb_strrpos($text," ", 'UTF-8')

You are not passing enough args to mb_strrpos() (you have omitted the offset - 3rd param, the encoding is the 4th param), try:

mb_strrpos($text," ", 0, 'UTF-8')

Although with the 2nd line omitted it, it looks OK, like you say... "I want to show only the first $len characters (finishing in a word)" - the 2nd line makes sure it finishes on a whole word?

EDIT: mb_substr() should be cutting at $len number of characters, not bytes. Are you sure the original text is actually UTF-8 and not some other encoding?

like image 3
MrWhite Avatar answered Nov 11 '22 06:11

MrWhite