Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split string containing command-line parameters into string[] in C#

It annoys me that there's no function to split a string based on a function that examines each character. If there was, you could write it like this:

    public static IEnumerable<string> SplitCommandLine(string commandLine)
    {
        bool inQuotes = false;

        return commandLine.Split(c =>
                                 {
                                     if (c == '\"')
                                         inQuotes = !inQuotes;

                                     return !inQuotes && c == ' ';
                                 })
                          .Select(arg => arg.Trim().TrimMatchingQuotes('\"'))
                          .Where(arg => !string.IsNullOrEmpty(arg));
    }

Although having written that, why not write the necessary extension methods. Okay, you talked me into it...

Firstly, my own version of Split that takes a function that has to decide whether the specified character should split the string:

    public static IEnumerable<string> Split(this string str, 
                                            Func<char, bool> controller)
    {
        int nextPiece = 0;

        for (int c = 0; c < str.Length; c++)
        {
            if (controller(str[c]))
            {
                yield return str.Substring(nextPiece, c - nextPiece);
                nextPiece = c + 1;
            }
        }

        yield return str.Substring(nextPiece);
    }

It may yield some empty strings depending on the situation, but maybe that information will be useful in other cases, so I don't remove the empty entries in this function.

Secondly (and more mundanely) a little helper that will trim a matching pair of quotes from the start and end of a string. It's more fussy than the standard Trim method - it will only trim one character from each end, and it will not trim from just one end:

    public static string TrimMatchingQuotes(this string input, char quote)
    {
        if ((input.Length >= 2) && 
            (input[0] == quote) && (input[input.Length - 1] == quote))
            return input.Substring(1, input.Length - 2);

        return input;
    }

And I suppose you'll want some tests as well. Well, alright then. But this must be absolutely the last thing! First a helper function that compares the result of the split with the expected array contents:

    public static void Test(string cmdLine, params string[] args)
    {
        string[] split = SplitCommandLine(cmdLine).ToArray();

        Debug.Assert(split.Length == args.Length);

        for (int n = 0; n < split.Length; n++)
            Debug.Assert(split[n] == args[n]);
    }

Then I can write tests like this:

        Test("");
        Test("a", "a");
        Test(" abc ", "abc");
        Test("a b ", "a", "b");
        Test("a b \"c d\"", "a", "b", "c d");

Here's the test for your requirements:

        Test(@"/src:""C:\tmp\Some Folder\Sub Folder"" /users:""[email protected]"" tasks:""SomeTask,Some Other Task"" -someParam",
             @"/src:""C:\tmp\Some Folder\Sub Folder""", @"/users:""[email protected]""", @"tasks:""SomeTask,Some Other Task""", @"-someParam");

Note that the implementation has the extra feature that it will remove quotes around an argument if that makes sense (thanks to the TrimMatchingQuotes function). I believe that's part of the normal command-line interpretation.


In addition to the good and pure managed solution by Earwicker, it may be worth mentioning, for sake of completeness, that Windows also provides the CommandLineToArgvW function for breaking up a string into an array of strings:

LPWSTR *CommandLineToArgvW(
    LPCWSTR lpCmdLine, int *pNumArgs);

Parses a Unicode command line string and returns an array of pointers to the command line arguments, along with a count of such arguments, in a way that is similar to the standard C run-time argv and argc values.

An example of calling this API from C# and unpacking the resulting string array in managed code can be found at, “Converting Command Line String to Args[] using CommandLineToArgvW() API.” Below is a slightly simpler version of the same code:

[DllImport("shell32.dll", SetLastError = true)]
static extern IntPtr CommandLineToArgvW(
    [MarshalAs(UnmanagedType.LPWStr)] string lpCmdLine, out int pNumArgs);

public static string[] CommandLineToArgs(string commandLine)
{
    int argc;
    var argv = CommandLineToArgvW(commandLine, out argc);        
    if (argv == IntPtr.Zero)
        throw new System.ComponentModel.Win32Exception();
    try
    {
        var args = new string[argc];
        for (var i = 0; i < args.Length; i++)
        {
            var p = Marshal.ReadIntPtr(argv, i * IntPtr.Size);
            args[i] = Marshal.PtrToStringUni(p);
        }

        return args;
    }
    finally
    {
        Marshal.FreeHGlobal(argv);
    }
}

The Windows command-line parser behaves just as you say, split on space unless there's a unclosed quote before it. I would recommend writing the parser yourself. Something like this maybe:

    static string[] ParseArguments(string commandLine)
    {
        char[] parmChars = commandLine.ToCharArray();
        bool inQuote = false;
        for (int index = 0; index < parmChars.Length; index++)
        {
            if (parmChars[index] == '"')
                inQuote = !inQuote;
            if (!inQuote && parmChars[index] == ' ')
                parmChars[index] = '\n';
        }
        return (new string(parmChars)).Split('\n');
    }

I took the answer from Jeffrey L Whitledge and enhanced it a little.

It now supports both single and double quotes. You can use quotes in the parameters itself by using other typed quotes.

It also strips the quotes from the arguments since these do not contribute to the argument information.

    public static string[] SplitArguments(string commandLine)
    {
        var parmChars = commandLine.ToCharArray();
        var inSingleQuote = false;
        var inDoubleQuote = false;
        for (var index = 0; index < parmChars.Length; index++)
        {
            if (parmChars[index] == '"' && !inSingleQuote)
            {
                inDoubleQuote = !inDoubleQuote;
                parmChars[index] = '\n';
            }
            if (parmChars[index] == '\'' && !inDoubleQuote)
            {
                inSingleQuote = !inSingleQuote;
                parmChars[index] = '\n';
            }
            if (!inSingleQuote && !inDoubleQuote && parmChars[index] == ' ')
                parmChars[index] = '\n';
        }
        return (new string(parmChars)).Split(new[] { '\n' }, StringSplitOptions.RemoveEmptyEntries);
    }

Because I wanted the same behavior as OP (split a string exactly the same as windows cmd would do it) I wrote a bunch of test cases and tested the here posted answers:

    Test( 0, m, "One",                    new[] { "One" });
    Test( 1, m, "One ",                   new[] { "One" });
    Test( 2, m, " One",                   new[] { "One" });
    Test( 3, m, " One ",                  new[] { "One" });
    Test( 4, m, "One Two",                new[] { "One", "Two" });
    Test( 5, m, "One  Two",               new[] { "One", "Two" });
    Test( 6, m, "One   Two",              new[] { "One", "Two" });
    Test( 7, m, "\"One Two\"",            new[] { "One Two" });
    Test( 8, m, "One \"Two Three\"",      new[] { "One", "Two Three" });
    Test( 9, m, "One \"Two Three\" Four", new[] { "One", "Two Three", "Four" });
    Test(10, m, "One=\"Two Three\" Four", new[] { "One=Two Three", "Four" });
    Test(11, m, "One\"Two Three\" Four",  new[] { "OneTwo Three", "Four" });
    Test(12, m, "One\"Two Three   Four",  new[] { "OneTwo Three   Four" });
    Test(13, m, "\"One Two\"",            new[] { "One Two" });
    Test(14, m, "One\" \"Two",            new[] { "One Two" });
    Test(15, m, "\"One\"  \"Two\"",       new[] { "One", "Two" });
    Test(16, m, "One\\\"  Two",           new[] { "One\"", "Two" });
    Test(17, m, "\\\"One\\\"  Two",       new[] { "\"One\"", "Two" });
    Test(18, m, "One\"",                  new[] { "One" });
    Test(19, m, "\"One",                  new[] { "One" });
    Test(20, m, "One \"\"",               new[] { "One", "" });
    Test(21, m, "One \"",                 new[] { "One", "" });
    Test(22, m, "1 A=\"B C\"=D 2",        new[] { "1", "A=B C=D", "2" });
    Test(23, m, "1 A=\"B \\\" C\"=D 2",   new[] { "1", "A=B \" C=D", "2" });
    Test(24, m, "1 \\A 2",                new[] { "1", "\\A", "2" });
    Test(25, m, "1 \\\" 2",               new[] { "1", "\"", "2" });
    Test(26, m, "1 \\\\\" 2",             new[] { "1", "\\\"", "2" });
    Test(27, m, "\"",                     new[] { "" });
    Test(28, m, "\\\"",                   new[] { "\"" });
    Test(29, m, "'A B'",                  new[] { "'A", "B'" });
    Test(30, m, "^",                      new[] { "^" });
    Test(31, m, "^A",                     new[] { "A" });
    Test(32, m, "^^",                     new[] { "^" });
    Test(33, m, "\\^^",                   new[] { "\\^" });
    Test(34, m, "^\\\\",                  new[] { "\\\\" });
    Test(35, m, "^\"A B\"",               new[] { "A B" });

    // Test cases Anton

    Test(36, m, @"/src:""C:\tmp\Some Folder\Sub Folder"" /users:""[email protected]"" tasks:""SomeTask,Some Other Task"" -someParam foo", new[] { @"/src:C:\tmp\Some Folder\Sub Folder", @"/users:[email protected]", @"tasks:SomeTask,Some Other Task", @"-someParam", @"foo" });

    // Test cases Daniel Earwicker 

    Test(37, m, "",            new string[] { });
    Test(38, m, "a",           new[] { "a" });
    Test(39, m, " abc ",       new[] { "abc" });
    Test(40, m, "a b ",        new[] { "a", "b" });
    Test(41, m, "a b \"c d\"", new[] { "a", "b", "c d" });

    // Test cases Fabio Iotti 

    Test(42, m, "this is a test ",    new[] { "this", "is", "a", "test" });
    Test(43, m, "this \"is a\" test", new[] { "this", "is a", "test" });

    // Test cases Kevin Thach

    Test(44, m, "\"C:\\Program Files\"",                       new[] { "C:\\Program Files" });
    Test(45, m, "\"He whispered to her \\\"I love you\\\".\"", new[] { "He whispered to her \"I love you\"." });

the "expected" value comes from directly testing it with cmd.exe on my machine (Win10 x64) and a simple print program:

static void Main(string[] args) => Console.Out.WriteLine($"Count := {args.Length}\n{string.Join("\n", args.Select((v,i) => $"[{i}] => '{v}'"))}");

These are the results:


Solution                      | Failed Tests
------------------------------|------------------------------------- 
Atif Aziz (749653)            | 2, 3, 10, 11, 12, 14, 16, 17, 18, 26, 28, 31, 32, 33, 34, 35, 36, 37, 39, 45
Jeffrey L Whitledge (298968)  | 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 31, 32, 33, 34, 35, 36, 37, 39, 40, 41, 42, 43, 44, 45
Daniel Earwicker (298990)     | 10, 11, 12, 14, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 31, 32, 33, 34, 35, 36, 45
Anton (299795)                | 12, 16, 17, 18, 19, 21, 23, 25, 26, 27, 28, 31, 32, 33, 34, 35, 45
CS. (467313)                  | 12, 18, 19, 21, 27, 31, 32, 33, 34, 35
Vapour in the Alley (2132004) | 10, 11, 12, 14, 16, 17, 20, 21, 22, 23, 25, 26, 27, 28, 29, 31, 32, 33, 34, 35, 36, 45
Monoman (7774211)             | 14, 16, 17, 20, 21, 22, 23, 25, 26, 27, 28, 31, 32, 33, 34, 35, 45
Thomas Petersson (19091999)   | 2, 3, 10, 11, 12, 14, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 31, 32, 33, 34, 35, 36, 39, 45
Fabio Iotti (19725880)        | 1, 2, 3, 7, 10, 11, 12, 13, 14, 15, 16, 17, 19, 21, 22, 23, 25, 26, 28, 29, 30, 35, 36, 37, 39, 40, 42, 44, 45
ygoe (23961658)               | 26, 31, 32, 33, 34, 35
Kevin Thach (24829691)        | 10, 11, 12, 14, 18, 19, 20, 21, 22, 23, 26, 27, 31, 32, 33, 34, 35, 36
Lucas De Jesus (31621370)     | 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45
HarryP (48008872)             | 24, 26, 31, 32, 33, 34, 35
TylerY86 (53290784)           | 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 31, 32, 33, 34, 35, 36, 41, 43, 44, 45
Louis Somers (55903304)       | 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 31, 32, 33, 34, 35, 36, 39, 41, 43, 44, 45
user2126375 (58233585)        | 5, 6, 15, 16, 17, 31, 32, 33, 34, 35
DilipNannaware (59131568)     | 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 31, 32, 33, 34, 35, 36, 37, 39, 40, 41, 42, 43, 44, 45
Mikescher (this)              | -

Because no answer seemed correct (at least based on my use case) here is my solution, it currently passes all test cases (but if anyone has additional (failing) corner cases please comment):

public static IEnumerable<string> SplitArgs(string commandLine)
{
    var result = new StringBuilder();

    var quoted = false;
    var escaped = false;
    var started = false;
    var allowcaret = false;
    for (int i = 0; i < commandLine.Length; i++)
    {
        var chr = commandLine[i];

        if (chr == '^' && !quoted)
        {
            if (allowcaret)
            {
                result.Append(chr);
                started = true;
                escaped = false;
                allowcaret = false;
            }
            else if (i + 1 < commandLine.Length && commandLine[i + 1] == '^')
            {
                allowcaret = true;
            }
            else if (i + 1 == commandLine.Length)
            {
                result.Append(chr);
                started = true;
                escaped = false;
            }
        }
        else if (escaped)
        {
            result.Append(chr);
            started = true;
            escaped = false;
        }
        else if (chr == '"')
        {
            quoted = !quoted;
            started = true;
        }
        else if (chr == '\\' && i + 1 < commandLine.Length && commandLine[i + 1] == '"')
        {
            escaped = true;
        }
        else if (chr == ' ' && !quoted)
        {
            if (started) yield return result.ToString();
            result.Clear();
            started = false;
        }
        else
        {
            result.Append(chr);
            started = true;
        }
    }

    if (started) yield return result.ToString();
}

The code I used to generate the test results can be found here


The good and pure managed solution by Earwicker failed to handle arguments like this:

Test("\"He whispered to her \\\"I love you\\\".\"", "He whispered to her \"I love you\".");

It returned 3 elements:

"He whispered to her \"I
love
you\"."

So here is a fix to support the "quoted \"escape\" quote":

public static IEnumerable<string> SplitCommandLine(string commandLine)
{
    bool inQuotes = false;
    bool isEscaping = false;

    return commandLine.Split(c => {
        if (c == '\\' && !isEscaping) { isEscaping = true; return false; }

        if (c == '\"' && !isEscaping)
            inQuotes = !inQuotes;

        isEscaping = false;

        return !inQuotes && Char.IsWhiteSpace(c)/*c == ' '*/;
        })
        .Select(arg => arg.Trim().TrimMatchingQuotes('\"').Replace("\\\"", "\""))
        .Where(arg => !string.IsNullOrEmpty(arg));
}

Tested with 2 additional cases:

Test("\"C:\\Program Files\"", "C:\\Program Files");
Test("\"He whispered to her \\\"I love you\\\".\"", "He whispered to her \"I love you\".");

Also noted that the accepted answer by Atif Aziz which uses CommandLineToArgvW also failed. It returned 4 elements:

He whispered to her \ 
I 
love 
you". 

Hope this helps someone looking for such a solution in the future.


I like iterators, and nowadays LINQ makes IEnumerable<String> as easily usable as arrays of string, so my take following the spirit of Jeffrey L Whitledge's answer is (as a extension method to string):

public static IEnumerable<string> ParseArguments(this string commandLine)
{
    if (string.IsNullOrWhiteSpace(commandLine))
        yield break;

    var sb = new StringBuilder();
    bool inQuote = false;
    foreach (char c in commandLine) {
        if (c == '"' && !inQuote) {
            inQuote = true;
            continue;
        }

        if (c != '"' && !(char.IsWhiteSpace(c) && !inQuote)) {
            sb.Append(c);
            continue;
        }

        if (sb.Length > 0) {
            var result = sb.ToString();
            sb.Clear();
            inQuote = false;
            yield return result;
        }
    }

    if (sb.Length > 0)
        yield return sb.ToString();
}