Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP: explode but ignore escaped delimiter

Tags:

php

I have a flatfile database and it is data seperated by delimiters.

I allow people to use the delimiter in their input but I make sure to escape it with a \ beforehand.

The problem is my explode() function still attempts to split the escaped delimiters, so how do I tell it to ignore them?

like image 528
George Reith Avatar asked Dec 15 '11 12:12

George Reith


2 Answers

Use preg_split instead. By using a regex you can match a delimeter only if it is not preceded with a backslash.

Edit:

preg_split('~(?<!\\\)' . preg_quote($delimeter, '~') . '~', $text);
like image 141
Sjoerd Avatar answered Sep 22 '22 13:09

Sjoerd


None of the solutions here correctly handle any number of escape characters, or they leave them in the output. Here's an alternative:

function separate($string, $separator = '|', $escape = '\\') {
    if (strlen($separator) != 1 || strlen($escape) != 1) {
        trigger_error(__FUNCTION__ . ' requires delimiters to be single characters.', E_USER_WARNING);
        return;
    }
    $segments = [];
    $string = (string) $string;
    do {
        $segment = '';
        do {
            $segment_length = strcspn($string, "$separator$escape");
            if ($segment_length) {
                $segment .= substr($string, 0, $segment_length);
            }
            if (strlen($string) <= $segment_length) {
                $string = null;
                break;
            }
            if ($escaped = $string[$segment_length] == $escape) {
                $segment .= (string) substr($string, ++$segment_length, 1);
            }
            $string = (string) substr($string, ++$segment_length);
        } while ($escaped);
        $segments[] = $segment;
    } while ($string !== null);
    return $segments;
}

This will process a raw string like foo\|ba\r\\|baz| into foo|bar\, baz, and an empty string.

If you want to retain the escape character in the output, you will have to modify the function.

Note: this will have unpredictable behaviour if you're using mb function overloading.

like image 23
Walf Avatar answered Sep 20 '22 13:09

Walf