Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

strlen, mb_strlen, which to use?

Tags:

php

How can i know the character set in $_REQUEST ? and how to set the character set of $_REQUEST ?

like image 668
lovespring Avatar asked Sep 19 '09 04:09

lovespring


People also ask

What is Mb_strlen?

In PHP, multibyte string length (mb_strlen) function is used to get the total string length of a specified string. This function is supported in PHP 4.6. 0 or higher versions.

What is the use of strlen () and strops () functions?

The strlen() is a built-in function in PHP which returns the length of a given string. It takes a string as a parameter and returns its length. It calculates the length of the string including all the whitespaces and special characters.

How do you count strlen?

You can simply use the PHP strlen() function to get the length of a string. The strlen() function return the length of the string on success, and 0 if the string is empty.

What is the use of strlen () function in PHP?

The strlen() function returns the length of a string.


3 Answers

To make it short: you do not really know about the encoding (character set) used on the variables that are passed to your PHP script via GET or POST (especially GET is a problem here). By convention browsers POST forms to the server-side resource specified in the action-attribute using the page encoding which can be specified via an http-equiv-meta-tag (charset-meta-tag in HTML5) or via an HTTP header. Alternatively some browsers also respect the accept-charset-attribute on the form when chosing the correct encoding.

The encoding of GET parameters and the URL itself depends on the browser stettings and can therefore be controlled by the user. You should not rely on a specific encoding.

Generally you'll circumnavigate most encoding-related problems by consistently using UTF-8 for everything and by specifying the correct encoding in the HTTP-header (Content-Type: text/html; charset=UTF-8) - this will yield the correct encoding (UTF-8) in all the variables that are passed into your string (we're not talking about rouge scripts that deliberately try to mess with the encoding to allow for some attack vectors into your script). You also should not rely on non-ascii-characters in your GET parameters or in the URL (that's also a reason why SEO-friendly links remove those characters or substitute them).

If you made sure that UTF-8 is the only allowed character-set you can use mb_strlen($string, 'UTF-8') to check the length of a variable for example.

EDIT: (added some links)

Some things for you to read:

  • The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
  • Handling UTF-8 with PHP
like image 159
Stefan Gehrig Avatar answered Oct 04 '22 03:10

Stefan Gehrig


use mb_internal_encoding to know which encoding is currently set. If you application use a log of different encoding you have better to use mb_strlen.

Cheers

like image 33
RageZ Avatar answered Oct 04 '22 03:10

RageZ


Usually you have control of the character encoding since you create the $_REQUEST from the HTML you send to the client.

ie: It is generated by a page you sent from PHP.

Thus you shouldn't have to detect the encoding.

Using the mb_functions requires enabling the multibyte extension - so if you're distributing code, you have to be aware not everyone will have it.

header('Content-Type: text/html; charset=UTF-8');

OR in HTML:

<meta charset="utf-8">

http://www.w3.org/International/O-charset

Edit: PHP6 has utf-8 support, not PHP5.

like image 31
bucabay Avatar answered Oct 04 '22 05:10

bucabay