Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx function to parse a command line without using a library

Tags:

c#

regex

I would like to split up a string using a space as my delimiter, but if there are multiple words enclosed in double or single quotes, then I would like them to be returned as one item.

For example if the input string is:

CALL "C:\My File Name With Space" /P1 P1Value /P1 P2Value

The output array would be:

Array[0]=Call
Array[1]=C:\My File Name With Space
Array[2]=/P1
Array[3]=P1Value
Array[4]=/P1
Array[5]=P2Value

How do you use regular expressions to do this? I realize that there are command line parsers. I took a cursory look at a popular one, but it did not handle the situation where you can have multiple parameters with the same name. In any event, instead of learning how to use a command line parsing library (leave that for another day). I'm interested in getting exposed more to RegEx functions.

How would you use a RegEx function to parse this?

like image 441
Chad Avatar asked Jun 11 '13 18:06

Chad


3 Answers

The link in Jim Mischel's comment points out that the Win32 API provides a function for this. I'd recommend using that for consistency. Here's a sample (from PInvoke).

static string[] SplitArgs(string unsplitArgumentLine)
{
    int numberOfArgs;
    IntPtr ptrToSplitArgs;
    string[] splitArgs;

    ptrToSplitArgs = CommandLineToArgvW(unsplitArgumentLine, out numberOfArgs);
    if (ptrToSplitArgs == IntPtr.Zero)
        throw new ArgumentException("Unable to split argument.",
          new Win32Exception());
    try
    {
        splitArgs = new string[numberOfArgs];
        for (int i = 0; i < numberOfArgs; i++)
            splitArgs[i] = Marshal.PtrToStringUni(
                Marshal.ReadIntPtr(ptrToSplitArgs, i * IntPtr.Size));
        return splitArgs;
    }
    finally
    {
        LocalFree(ptrToSplitArgs);
    }
}

[DllImport("shell32.dll", SetLastError = true)]
static extern IntPtr CommandLineToArgvW(
    [MarshalAs(UnmanagedType.LPWStr)] string lpCmdLine,
    out int pNumArgs);

[DllImport("kernel32.dll")]
static extern IntPtr LocalFree(IntPtr hMem);

If you want a quick-and-dirty, inflexible, fragile regex solution you can do something like this:

var rex = new Regex(@"("".*?""|[^ ""]+)+");
string test = "CALL \"C:\\My File Name With Space\" /P1 P1Value /P1 P2Value";
var array = rex.Matches(test).OfType<Match>().Select(m => m.Groups[0]).ToArray();
like image 86
Chad Avatar answered Oct 27 '22 09:10

Chad


I wouldn't do it with Regex, for various reasons shown above.

If I did need to, this would match your simple requirements:

(".*?")|([^ ]+)

However, this doesn't include:

  • Escaped quotes
  • Single quotes
  • non-ascii quotes (you don't think people will paste smart quotes from word into your file?)
  • combinations of the above

And that's just off the top of my head.

like image 26
jedigo Avatar answered Oct 27 '22 11:10

jedigo


@chad Henderson you forgot to include the single quotes, and this also have the problem of capturing anything that comes before a set of quotes.

here is the correction including the single quotes, but also shows the problem with the extra capture before a quote. http://regexhero.net/tester/?id=81cebbb2-5548-4973-be19-b508f14c3348

like image 39
Bruce Burge Avatar answered Oct 27 '22 11:10

Bruce Burge