Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Truncate a string in PHP to the word closest to a certain number of characters?

People also ask

How can I truncate a string to the first 20 words in PHP?

php $str = "this is a string that is just some text for you to test with"; print(truncateString($str, 20, false) . "\n"); print(truncateString($str, 22, false) .

How can I remove last 5 characters from a string in PHP?

You can use the PHP substr_replace() function to remove the last character from a string in PHP. $string = "Hello TecAdmin!"; echo "Original string: " . $string .

How do I remove a word from a string in PHP?

Answer: Use the PHP str_replace() function You can use the PHP str_replace() function to replace all the occurrences of a word within a string.


By using the wordwrap function. It splits the texts in multiple lines such that the maximum width is the one you specified, breaking at word boundaries. After splitting, you simply take the first line:

substr($string, 0, strpos(wordwrap($string, $your_desired_width), "\n"));

One thing this oneliner doesn't handle is the case when the text itself is shorter than the desired width. To handle this edge-case, one should do something like:

if (strlen($string) > $your_desired_width) 
{
    $string = wordwrap($string, $your_desired_width);
    $string = substr($string, 0, strpos($string, "\n"));
}

The above solution has the problem of prematurely cutting the text if it contains a newline before the actual cutpoint. Here a version which solves this problem:

function tokenTruncate($string, $your_desired_width) {
  $parts = preg_split('/([\s\n\r]+)/', $string, null, PREG_SPLIT_DELIM_CAPTURE);
  $parts_count = count($parts);

  $length = 0;
  $last_part = 0;
  for (; $last_part < $parts_count; ++$last_part) {
    $length += strlen($parts[$last_part]);
    if ($length > $your_desired_width) { break; }
  }

  return implode(array_slice($parts, 0, $last_part));
}

Also, here is the PHPUnit testclass used to test the implementation:

class TokenTruncateTest extends PHPUnit_Framework_TestCase {
  public function testBasic() {
    $this->assertEquals("1 3 5 7 9 ",
      tokenTruncate("1 3 5 7 9 11 14", 10));
  }

  public function testEmptyString() {
    $this->assertEquals("",
      tokenTruncate("", 10));
  }

  public function testShortString() {
    $this->assertEquals("1 3",
      tokenTruncate("1 3", 10));
  }

  public function testStringTooLong() {
    $this->assertEquals("",
      tokenTruncate("toooooooooooolooooong", 10));
  }

  public function testContainingNewline() {
    $this->assertEquals("1 3\n5 7 9 ",
      tokenTruncate("1 3\n5 7 9 11 14", 10));
  }
}

EDIT :

Special UTF8 characters like 'à' are not handled. Add 'u' at the end of the REGEX to handle it:

$parts = preg_split('/([\s\n\r]+)/u', $string, null, PREG_SPLIT_DELIM_CAPTURE);


This will return the first 200 characters of words:

preg_replace('/\s+?(\S+)?$/', '', substr($string, 0, 201));

$WidgetText = substr($string, 0, strrpos(substr($string, 0, 200), ' '));

And there you have it — a reliable method of truncating any string to the nearest whole word, while staying under the maximum string length.

I've tried the other examples above and they did not produce the desired results.


The following solution was born when I've noticed a $break parameter of wordwrap function:

string wordwrap ( string $str [, int $width = 75 [, string $break = "\n" [, bool $cut = false ]]] )

Here is the solution:

/**
 * Truncates the given string at the specified length.
 *
 * @param string $str The input string.
 * @param int $width The number of chars at which the string will be truncated.
 * @return string
 */
function truncate($str, $width) {
    return strtok(wordwrap($str, $width, "...\n"), "\n");
}

Example #1.

print truncate("This is very long string with many chars.", 25);

The above example will output:

This is very long string...

Example #2.

print truncate("This is short string.", 25);

The above example will output:

This is short string.

Keep in mind whenever you're splitting by "word" anywhere that some languages such as Chinese and Japanese do not use a space character to split words. Also, a malicious user could simply enter text without any spaces, or using some Unicode look-alike to the standard space character, in which case any solution you use may end up displaying the entire text anyway. A way around this may be to check the string length after splitting it on spaces as normal, then, if the string is still above an abnormal limit - maybe 225 characters in this case - going ahead and splitting it dumbly at that limit.

One more caveat with things like this when it comes to non-ASCII characters; strings containing them may be interpreted by PHP's standard strlen() as being longer than they really are, because a single character may take two or more bytes instead of just one. If you just use the strlen()/substr() functions to split strings, you may split a string in the middle of a character! When in doubt, mb_strlen()/mb_substr() are a little more foolproof.