PHP's wordwrap()
function doesn't work correctly for multi-byte strings like UTF-8.
There are a few examples of mb safe functions in the comments, but with some different test data they all seem to have some problems.
The function should take the exact same parameters as wordwrap()
.
Specifically be sure it works to:
$cut = true
, don't cut mid-word otherwise$break = ' '
$break = "\n"
I haven't found any working code for me. Here is what I've written. For me it is working, thought it is probably not the fastest.
function mb_wordwrap($str, $width = 75, $break = "\n", $cut = false) {
$lines = explode($break, $str);
foreach ($lines as &$line) {
$line = rtrim($line);
if (mb_strlen($line) <= $width)
continue;
$words = explode(' ', $line);
$line = '';
$actual = '';
foreach ($words as $word) {
if (mb_strlen($actual.$word) <= $width)
$actual .= $word.' ';
else {
if ($actual != '')
$line .= rtrim($actual).$break;
$actual = $word;
if ($cut) {
while (mb_strlen($actual) > $width) {
$line .= mb_substr($actual, 0, $width).$break;
$actual = mb_substr($actual, $width);
}
}
$actual .= ' ';
}
}
$line .= trim($actual);
}
return implode($break, $lines);
}
/**
* wordwrap for utf8 encoded strings
*
* @param string $str
* @param integer $len
* @param string $what
* @return string
* @author Milian Wolff <[email protected]>
*/
function utf8_wordwrap($str, $width, $break, $cut = false) {
if (!$cut) {
$regexp = '#^(?:[\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+){'.$width.',}\b#U';
} else {
$regexp = '#^(?:[\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+){'.$width.'}#';
}
if (function_exists('mb_strlen')) {
$str_len = mb_strlen($str,'UTF-8');
} else {
$str_len = preg_match_all('/[\x00-\x7F\xC0-\xFD]/', $str, $var_empty);
}
$while_what = ceil($str_len / $width);
$i = 1;
$return = '';
while ($i < $while_what) {
preg_match($regexp, $str,$matches);
$string = $matches[0];
$return .= $string.$break;
$str = substr($str, strlen($string));
$i++;
}
return $return.$str;
}
Total time: 0.0020880699 is good time :)
Because no answer was handling every use case, here is something that does. The code is based on Drupal’s AbstractStringWrapper::wordWrap
.
<?php
/**
* Wraps any string to a given number of characters.
*
* This implementation is multi-byte aware and relies on {@link
* http://www.php.net/manual/en/book.mbstring.php PHP's multibyte
* string extension}.
*
* @see wordwrap()
* @link https://api.drupal.org/api/drupal/core%21vendor%21zendframework%21zend-stdlib%21Zend%21Stdlib%21StringWrapper%21AbstractStringWrapper.php/function/AbstractStringWrapper%3A%3AwordWrap/8
* @param string $string
* The input string.
* @param int $width [optional]
* The number of characters at which <var>$string</var> will be
* wrapped. Defaults to <code>75</code>.
* @param string $break [optional]
* The line is broken using the optional break parameter. Defaults
* to <code>"\n"</code>.
* @param boolean $cut [optional]
* If the <var>$cut</var> is set to <code>TRUE</code>, the string is
* always wrapped at or before the specified <var>$width</var>. So if
* you have a word that is larger than the given <var>$width</var>, it
* is broken apart. Defaults to <code>FALSE</code>.
* @return string
* Returns the given <var>$string</var> wrapped at the specified
* <var>$width</var>.
*/
function mb_wordwrap($string, $width = 75, $break = "\n", $cut = false) {
$string = (string) $string;
if ($string === '') {
return '';
}
$break = (string) $break;
if ($break === '') {
trigger_error('Break string cannot be empty', E_USER_ERROR);
}
$width = (int) $width;
if ($width === 0 && $cut) {
trigger_error('Cannot force cut when width is zero', E_USER_ERROR);
}
if (strlen($string) === mb_strlen($string)) {
return wordwrap($string, $width, $break, $cut);
}
$stringWidth = mb_strlen($string);
$breakWidth = mb_strlen($break);
$result = '';
$lastStart = $lastSpace = 0;
for ($current = 0; $current < $stringWidth; $current++) {
$char = mb_substr($string, $current, 1);
$possibleBreak = $char;
if ($breakWidth !== 1) {
$possibleBreak = mb_substr($string, $current, $breakWidth);
}
if ($possibleBreak === $break) {
$result .= mb_substr($string, $lastStart, $current - $lastStart + $breakWidth);
$current += $breakWidth - 1;
$lastStart = $lastSpace = $current + 1;
continue;
}
if ($char === ' ') {
if ($current - $lastStart >= $width) {
$result .= mb_substr($string, $lastStart, $current - $lastStart) . $break;
$lastStart = $current + 1;
}
$lastSpace = $current;
continue;
}
if ($current - $lastStart >= $width && $cut && $lastStart >= $lastSpace) {
$result .= mb_substr($string, $lastStart, $current - $lastStart) . $break;
$lastStart = $lastSpace = $current;
continue;
}
if ($current - $lastStart >= $width && $lastStart < $lastSpace) {
$result .= mb_substr($string, $lastStart, $lastSpace - $lastStart) . $break;
$lastStart = $lastSpace = $lastSpace + 1;
continue;
}
}
if ($lastStart !== $current) {
$result .= mb_substr($string, $lastStart, $current - $lastStart);
}
return $result;
}
?>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With