Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

strlen() and UTF-8 encoding

Assuming UTF-8 encoding, and strlen() in PHP, is it possible that this string has a length of 4?

I'm only interested to know about strlen(), not other functions

This is the string:

$1�2

I have tested it on my own computer, and I have verified UTF-8 encoding, and the answer I get is 6.

I don't see anything in the manual for strlen or anything I've read on UTF-8 that would explain why some of the characters above would count for less than one.

PS: This question and answer (4) comes from a mock test for ZCE I bought on Ebay.

like image 876
Jon Lyles Avatar asked Jun 14 '12 13:06

Jon Lyles


People also ask

Does strlen work on UTF-8?

As the manual says: "strlen() returns the number of bytes rather than the number of characters in a string.", so if you want to get the number of characters in a string of UTF8 so use mb_strlen() instead of strlen().

Does strlen count special characters?

The strlen() is a built-in function in PHP which returns the length of a given string. It takes a string as a parameter and returns its length. It calculates the length of the string including all the whitespaces and special characters.

What UTF-8 means?

UTF-8 (UCS Transformation Format 8) is the World Wide Web's most common character encoding. Each character is represented by one to four bytes. UTF-8 is backward-compatible with ASCII and can represent any standard Unicode character.

What is UTF-8 and what problem does it solve?

UTF-8 is a way of encoding Unicode so that an ASCII text file encodes to itself. No wasted space, beyond the initial bit of every byte ASCII doesn't use. And if your file is mostly ASCII text with a few non-ASCII characters sprinkled in, the non-ASCII characters just make your file a little longer.


1 Answers

how about using mb_strlen() ?

http://lt.php.net/manual/en/function.mb-strlen.php

But if you need to use strlen, its possible to configure your webserver by setting mbstring.func_overload directive to 2, so it will automatically replace using of strlen to mb_strlen in your scripts.

like image 91
Anton Avatar answered Sep 16 '22 20:09

Anton