Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the simplest algorithm to escape a single character?

I'm trying to write two functions escape(text, delimiter) and unescape(text, delimiter) with the following properties:

  1. The result of escape does not contain delimiter.

  2. unescape is the reverse of escape, i.e.

    unescape(escape(text, delimiter), delimiter) == text
    

    for all values of text and delimiter

It is OK to restrict the allowed values of delimiter.


Background: I want to create a delimiter-separated string of values. To be able to extract the same list out of the string again, I must ensure that the individual, separated strings do not contain the separator.


What I've tried: I came up with a simple solution (pseudo-code):

escape(text, delimiter):   return text.Replace("\", "\\").Replace(delimiter, "\d")
unescape(text, delimiter): return text.Replace("\d", delimiter).Replace("\\", "\")

but discovered that property 2 failed on the test string "\d<delimiter>". Currently, I have the following working solution

escape(text, delimiter):   return text.Replace("\", "\b").Replace(delimiter, "\d")
unescape(text, delimiter): return text.Replace("\d", delimiter).Replace("\b", "\")

which seems to work, as long as delimiter is not \, b or d (which is fine, I don't want to use those as delimiters anyway). However, since I have not formally proven its correctness, I'm afraid that I have missed some case where one of the properties is violated. Since this is such a common problem, I assume that there is already a "well-known proven-correct" algorithm for this, hence my question (see title).

like image 815
Heinzi Avatar asked Jun 14 '12 13:06

Heinzi


People also ask

What is the '\ n escape character?

In particular, the \n escape sequence represents the newline character. A \n in a printf format string tells awk to start printing output at the beginning of a newline.

How do I escape a character in a string?

In string and character sequences, when you want the backslash to represent itself (rather than the beginning of an escape sequence), you must use a \\ backslash escape sequence.

Which escape sequence is used for character?

Character combinations consisting of a backslash (\) followed by a letter or by a combination of digits are called "escape sequences." To represent a newline character, single quotation mark, or certain other characters in a character constant, you must use escape sequences.

How do I ignore an escape character in a string?

An escape sequence is a set of characters used in string literals that have a special meaning, such as a new line, a new page, or a tab. For example, the escape sequence \n represents a new line character. To ignore an escape sequence in your search, prepend a backslash character to the escape sequence.


1 Answers

Your first algorithm is correct.

The error is in the implementation of unescape(): you need to replace both \d by delimiter and \\ by \, in the same pass. You can't use several calls to Replace() like this.

Here's some sample C# code for safe quoting of delimiter-separated strings:

    static string QuoteSeparator(string str,
        char separator, char quoteChar, char otherChar) // "~" -> "~~"     ";" -> "~s"
    {
        var sb = new StringBuilder(str.Length);
        foreach (char c in str)
        {
            if (c == quoteChar)
            {
                sb.Append(quoteChar);
                sb.Append(quoteChar);
            }
            else if (c == separator)
            {
                sb.Append(quoteChar);
                sb.Append(otherChar);
            }
            else
            {
                sb.Append(c);
            }
        }
        return sb.ToString(); // no separator in the result -> Join/Split is safe
    }
    static string UnquoteSeparator(string str,
        char separator, char quoteChar, char otherChar) // "~~" -> "~"     "~s" -> ";"
    {
        var sb = new StringBuilder(str.Length);
        bool isQuoted = false;
        foreach (char c in str)
        {
            if (isQuoted)
            {
                if (c == otherChar)
                    sb.Append(separator);
                else
                    sb.Append(c);
                isQuoted = false;
            }
            else
            {
                if (c == quoteChar)
                    isQuoted = true;
                else
                    sb.Append(c);
            }
        }
        if (isQuoted)
            throw new ArgumentException("input string is not correctly quoted");
        return sb.ToString(); // ";" are restored
    }

    /// <summary>
    /// Encodes the given strings as a single string.
    /// </summary>
    /// <param name="input">The strings.</param>
    /// <param name="separator">The separator.</param>
    /// <param name="quoteChar">The quote char.</param>
    /// <param name="otherChar">The other char.</param>
    /// <returns></returns>
    public static string QuoteAndJoin(this IEnumerable<string> input,
        char separator = ';', char quoteChar = '~', char otherChar = 's')
    {
        CommonHelper.CheckNullReference(input, "input");
        if (separator == quoteChar || quoteChar == otherChar || separator == otherChar)
            throw new ArgumentException("cannot quote: ambiguous format");
        return string.Join(new string(separator, 1), (from str in input select QuoteSeparator(str, separator, quoteChar, otherChar)).ToArray());
    }

    /// <summary>
    /// Decodes the strings encoded in a single string.
    /// </summary>
    /// <param name="encoded">The encoded.</param>
    /// <param name="separator">The separator.</param>
    /// <param name="quoteChar">The quote char.</param>
    /// <param name="otherChar">The other char.</param>
    /// <returns></returns>
    public static IEnumerable<string> SplitAndUnquote(this string encoded,
        char separator = ';', char quoteChar = '~', char otherChar = 's')
    {
        CommonHelper.CheckNullReference(encoded, "encoded");
        if (separator == quoteChar || quoteChar == otherChar || separator == otherChar)
            throw new ArgumentException("cannot unquote: ambiguous format");
        return from s in encoded.Split(separator) select UnquoteSeparator(s, separator, quoteChar, otherChar);
    }
like image 118
Eldritch Conundrum Avatar answered Oct 16 '22 02:10

Eldritch Conundrum