Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a string by space except when contained within quotes

Tags:

regex

r

strsplit

I've been trying to split a space delimited string with double-quotes in R for some time but without success. An example of a string is as follows:

rainfall snowfall "Channel storage" "Rivulet storage"

It's important for us because these are column headings that must match the subsequent data. There are other suggestions on this site as to how to go about this but they don't seem to work with R. One example:

Regex for splitting a string using space when not surrounded by single or double quotes

Here is some code I've been trying:

str <- 'rainfall snowfall "Channel storage" "Rivulet storage"'
regex <- "[^\\s\"']+|\"([^\"]*)\""
split <- strsplit(str, regex, perl=T)

what I would like is

[1] "rainfall" "snowfall" "Channel storage" "Rivulet storage"

but what I get is:

[1] ""  " " " " " "

The vector is the right length (which is encouraging) but of course the strings are empty or contain a single space. Any suggestions?

Thanks in advance!

like image 944
downtowater Avatar asked Nov 29 '12 14:11

downtowater


People also ask

How to split text by space in Excel?

Read More: How to Split Text in Excel Using Formula (5 Easy Ways) In this method, we will use a combination function of TRIM, SUBSTITUTE, COLUMNS, LEN, and REPT functions to split texts by space. Firstly, type the following formula in cell C5. Now, press the ENTER key. At this point, drag down to the right to AutoFill the row series.

How do you handle double quotes in a string?

Just as well, if within a quote then two adjacent double-quotes ( "") should be treated as a double-quote escape, and as such should be output directly into the resultant string. This actually posed a larger challenge than I originally anticipated, but in the end it was resolved pretty swiftly.

Is there a way to split a character in a regex?

This is a really powerful feature in regex, but can be difficult to implement. To practice, try looking at the regex we gave and see if you can modify it to split on a different character, like a semicolon (; ).

How can I check for quotation marks in a string?

This regex string uses what's called a "positive lookahead" to check for quotation marks without actually matching them. This is a really powerful feature in regex, but can be difficult to implement. To practice, try looking at the regex we gave and see if you can modify it to split on a different character, like a semicolon (; ).


1 Answers

scan will do this for you

scan(text=str, what='character', quiet=TRUE)
[1] "rainfall"        "snowfall"        "Channel storage" "Rivulet storage"
like image 63
Matthew Plourde Avatar answered Oct 30 '22 02:10

Matthew Plourde