How to split a string into a list of words in TCL, ignoring multiple spaces?

Tags:

Basically, I have a string that consists of multiple, space-separated words. The thing is, however, that there can be multiple spaces instead of just one separating the words. This is why [split] does not do what I want:

split "a    b"

gives me this:

{a {} {} {} b}

instead of this:

{a b}

Searching Google, I found a page on the Tcler's wiki, where a user asked more or less the same question.

One proposed solution would look like this:

split [regsub -all {\s+} "a    b" " "]

which seems to work for simple string. But a test string such as [string repeat " " 4] (used string repeat because StackOverflow strips multiple spaces) will result in regsub returning " ", which split would again split up into {{} {}} instead of an empty list.

Another proposed solution was this one, to force a reinterpretation of the given string as a list:

lreplace "a   list   with many   spaces" 0 -1

But if there's one thing I've learned about TCL, it is that you should never use list functions (starting with l) on strings. And indeed, this one will choke on strings containing special characters (namely { and }):

lreplace "test    \{a b\}"

returns test {a b} instead of test \{a b\} (which would be what I want, every space-separated word split up into a single element of the resulting list).

Yet another solution was to use a 'filter':

proc filter {cond list} {
    set res {}
    foreach element $list {if [$cond $element] {lappend res $element}}
    set res
}

You'd then use it like this:

filter llength [split "a   list   with many   spaces"]

Again, same problem. This would call llength on a string, which might contain special characters (again, { and }) - passing it "\{a b\}" would result in TCL complaining about an "unmatched open brace in list".

I managed to get it to work by modifying the given filter function, adding a {*} in front of $cond in the if, so I could use it with string length instead of llength, which seemed to work for every possible input I've tried to use it on so far.

Is this solution safe to use as it is now? Would it choke on some special input I didn't test so far? Or, is it possible to do this right in a simpler way?

493

asked Nov 14 '12 14:11

Jerry

1 Answers

The easiest way is to use regexp -all -inline to select and return all words. For example:

# The RE matches any non-empty sequence of non-whitespace characters
set theWords [regexp -all -inline {\S+} $theString]

If instead you define words to be sequences of alphanumerics, you instead use this for the regular expression term: {\w+}

198

answered Nov 15 '22 09:11

Donal Fellows

Related questions
                            
                                Searching for multiple strings in multiple files
                            
                                C# convert string to uint
                            
                                Why is int(50)<str(5) in python 2.x?
                            
                                What is the maximum size for a string in C#? [duplicate]
                            
                                JavaScript truthiness in boolean to numbers comparison
                            
                                How to sort a string array by numeric style?
                            
                                Regex for ANY string except "www"? (subdomain)
                            
                                Address of each character of std::string
                            
                                How to return string that contains string/int variables
                            
                                Assign ASCII character to wire in Verilog
                            
                                Encoding name strings into an unique number
                            
                                Append JSON Strings
                            
                                Expand alphabetical range to list of characters in Python
                            
                                Extract a line from an EditText
                            
                                Why is '\n' === '\\n' true in php?
                            
                                Parse a String to Date in Java
                            
                                Python regex for matching two or three white spaces
                            
                                Extracting string between two characters?
                            
                                Splitting a String in Java with underscore as delimiter
                            
                                How can I find the first occurrence of a substring occurring after another substring in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to split a string into a list of words in TCL, ignoring multiple spaces?

Tags:

string

split

tcl

Jerry

People also ask

1 Answers

Donal Fellows

Recent Activity

Donate For Us