Can str_replace be safely used on a UTF-8 encoded string if it's only given valid UTF-8 encoded strings as arguments?

Tags:

PHP's str_replace() was intended only for ANSI strings and as such can mangle UTF-8 strings. However, given that it's binary-safe would it work properly if it was only given valid UTF-8 strings as arguments?

Edit: I'm not looking for a replacement function, I would just like to know if this hypothesis is correct.

750

asked Apr 16 '10 10:04

Manos Dilaverakis

2 Answers

Yes. UTF-8 is deliberately designed to allow this and other similar non-Unicode-aware processing.

In UTF-8, any non-ASCII byte sequence representing a valid character always begins with a byte in the range \xC0-\xFF. This byte may not appear anywhere else in the sequence, so you can't make a valid UTF-8 sequence that matches part of a character.

This is not the case for older multibyte encodings, where different parts of a byte sequence are indistinguishable. This caused a lot of problems, for example trying to replace an ASCII backslash in a Shift-JIS string (where byte \x5C might be the second byte of a character sequence representing something else).

answered Sep 22 '22 02:09

bobince

It's correct because UTF-8 multibyte characters are exclusively non-ASCII (128+ byte value) characters beginning with a byte that defines how many bytes follow, so you can't accidentally end up matching a part of one UTF-8 multibyte character with another.

To visualise (abstractly):

a for an ASCII character
2x for a 2-byte character
3xx for a 3-byte character
4xxx for a 4-byte character

If you're matching, say, a2x3xx (a bytes in ASCII range), since a < x, and 2x cannot be a subset of 3xx or 4xxx, et cetera, you can be safe that your UTF-8 will match correctly, given the prerequisite that all strings are definitely valid UTF-8.

Edit: See bobince's answer for a less abstract explanation.

answered Sep 24 '22 02:09

pinkgothic

Related questions
                            
                                Dynamic shipping fee based on custom radio buttons in Woocommerce
                            
                                PHP: read a remote file (ideally using fopen)
                            
                                Derive a 32-byte key from a password deterministically in PHP
                            
                                Login and register via ajax is secure or not?
                            
                                Laravel Excel vertical align center
                            
                                Replace double quotes with single quotes in json string in php
                            
                                Creating adcreative on facebook api getting error_subcode 1885833
                            
                                Let users upload videos from my site server to their youtube channel
                            
                                How to resolve mysql port 3306 error on wamp?
                            
                                Laravel - Unauthenticated redirect issue with multiple authentication
                            
                                Join two tables with group by condition
                            
                                How to ignore the specific CSS codes coming from the WordPress plugin stylesheet?
                            
                                What is causing this memory leak when (inner) joining this table?
                            
                                How do you customize variables in Laravel default emails?
                            
                                Embedding image in laravel markdown email
                            
                                Add conditionally a discount programmatically to Woocommerce 3
                            
                                What does (int) $_GET['page'] mean in PHP?
                            
                                How to access the php.ini from cPanel?
                            
                                Can you pass by reference while using the ternary operator?
                            
                                How can I make visual studio highlight .php files

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With