Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

str_replace() on multibyte strings dangerous?

Tags:

php

multibyte

Given certain multibyte character sets, am I correct in assuming that the following doesn't do what it was intended to do?

$string = str_replace('"', '\\"', $string); 

In particular, if the input was in a character set that might have a valid character like 0xbf5c, so an attacker can inject 0xbf22 to get 0xbf5c22, leaving a valid character followed by an unquoted double quote (").

Is there an easy way to mitigate this problem, or am I misunderstanding the issue in the first place?

(In my case, the string is going into the value attribute of an HTML input tag: echo 'input type="text" value="' . $string . '">';)

EDIT: For that matter, what about a function like preg_quote()? There's no charset argument for it, so it seems totally useless in this scenario. When you DON'T have the option of limiting charset to UTF-8 (yes, that'd be nice), it seems like you are really handicapped. What replace and quoting functions are available in that case?

like image 312
user456885 Avatar asked Sep 24 '10 10:09

user456885


People also ask

Is Str_replace multibyte safe?

The code is perfectly safe with sane multibyte-encodings like UTF-8 and EUC-TW, but dangerous with broken ones like Shift_JIS, GB*, etc. Rather than going through all the headache and overhead to be safe with these legacy encodings, I would recommend just supporting only UTF-8.

What is a multibyte string?

A null-terminated multibyte string (NTMBS), or "multibyte string", is a sequence of nonzero bytes followed by a byte with value zero (the terminating null character). Each character stored in the string may occupy more than one byte.


1 Answers

No, you’re right: Using a singlebyte string function on a multibyte string can cause an unexpected result. Use the multibyte string functions instead, for example mb_ereg_replace or mb_split:

$string = mb_ereg_replace('"', '\\"', $string); $string = implode('\\"', mb_split('"', $string)); 

Edit    Here’s a mb_replace implementation using the split-join variant:

function mb_replace($search, $replace, $subject, &$count=0) {     if (!is_array($search) && is_array($replace)) {         return false;     }     if (is_array($subject)) {         // call mb_replace for each single string in $subject         foreach ($subject as &$string) {             $string = &mb_replace($search, $replace, $string, $c);             $count += $c;         }     } elseif (is_array($search)) {         if (!is_array($replace)) {             foreach ($search as &$string) {                 $subject = mb_replace($string, $replace, $subject, $c);                 $count += $c;             }         } else {             $n = max(count($search), count($replace));             while ($n--) {                 $subject = mb_replace(current($search), current($replace), $subject, $c);                 $count += $c;                 next($search);                 next($replace);             }         }     } else {         $parts = mb_split(preg_quote($search), $subject);         $count = count($parts)-1;         $subject = implode($replace, $parts);     }     return $subject; } 

As regards the combination of parameters, this function should behave like the singlebyte str_replace.

like image 155
Gumbo Avatar answered Sep 21 '22 16:09

Gumbo