Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I split a string by whitespace unless inside of a single quoted string?

Tags:

regex

split

perl

I'm seeking a solution to splitting a string which contains text in the following format:

"abcd efgh 'ijklm no pqrs' tuv"

which will produce the following results:

['abcd', 'efgh', 'ijklm no pqrs', 'tuv']

In other words, it splits by whitespace unless inside of a single quoted string. I think it could be done with .NET regexps using "Lookaround" operators, particularly balancing operators. I'm not so sure about Perl.

like image 298
Jesse Hallam Avatar asked Mar 17 '10 03:03

Jesse Hallam


People also ask

How do I split a string based on space but take quoted substrings as one word?

How do I split a string based on space but take quoted substrings as one word? \S* - followed by zero or more non-space characters.

How do you split a string by whitespace?

You can split a String by whitespaces or tabs in Java by using the split() method of java. lang. String class. This method accepts a regular expression and you can pass a regex matching with whitespace to split the String where words are separated by spaces.

Can you have a string with a quote inside it?

Sometimes you might want to place quotation marks (" ") in a string of text. For example: She said, "You deserve a treat!" As an alternative, you can also use the Quote field as a constant.

How do you split a string including whitespace in Python?

Python String split() MethodThe split() method splits a string into a list. You can specify the separator, default separator is any whitespace. Note: When maxsplit is specified, the list will contain the specified number of elements plus one.


2 Answers

Use Text::ParseWords:

#!/usr/bin/perl

use strict; use warnings;
use Text::ParseWords;

my @words = parse_line('\s+', 0, "abcd efgh 'ijklm no pqrs' tuv");

use Data::Dumper;
print Dumper \@words;

Output:

C:\Temp> ff
$VAR1 = [
          'abcd',
          'efgh',
          'ijklm no pqrs',
          'tuv'
        ];

You can look at the source code for Text::ParseWords::parse_line to see the pattern used.

like image 152
Sinan Ünür Avatar answered Sep 18 '22 12:09

Sinan Ünür


use strict; use warnings;

my $text = "abcd efgh 'ijklm no pqrs' tuv 'xwyz 1234 9999' 'blah'";
my @out;

my @parts = split /'/, $text;

for ( my $i = 1; $i < $#parts; $i += 2 ) {
    push @out, split( /\s+/, $parts[$i - 1] ), $parts[$i];
}

push @out, $parts[-1];

use Data::Dumper;
print Dumper \@out;
like image 23
ghostdog74 Avatar answered Sep 19 '22 12:09

ghostdog74