Given the strings:
strs = [
"foo",
" ",
"Hello \n there",
" Ooh, leading and trailing space! ",
]
I want a simple method identifying all contiguous runs of whitespace and non-whitespace characters, in order, along with whether the run is whitespace or not:
strs.each{ |str| p find_whitespace_runs(str) }
#=> [ {k:1, s:"foo"} ],
#=> [ {k:0, s:" "} ],
#=> [ {k:1, s:"Hello"}, {k:0, s:" \n "}, {k:1, s:"World"} ],
#=> [
#=> {k:0, s:" "},
#=> {k:1, s:"Ooh,"},
#=> {k:0, s:" "},
#=> {k:1, s:"leading"},
#=> {k:0, s:" "},
#=> {k:1, s:"and"},
#=> {k:0, s:" "},
#=> {k:1, s:"trailing"},
#=> {k:0, s:" "},
#=> {k:1, s:"space!"},
#=> {k:0, s:" "},
#=> ]
This almost works, but includes a single leading {k:0, s:""}
group whenever the string does not start with whitespace:
def find_whitespace_runs(str)
str.split(/(\S+)/).map.with_index do |s,i|
{k:i%2, s:s}
end
end
Real-world motivation: writing a syntax highlighter that distinguishes whitespace from non-whitespace in otherwise-unlexed code.
Java Character isWhitespace() Method The isWhitespace(int codePoint) method of Character class determines whether the given(or specified) character is a whitespace character or not.
Space, tab, line feed (newline), carriage return, form feed, and vertical tab characters are called "white-space characters" because they serve the same purpose as the spaces between words and lines on a printed page — they make reading easier.
You can use charAt() function to find out spaces in string.
whitespace is a pre-initialized string used as string constant. In Python, string. whitespace will give the characters space, tab, linefeed, return, formfeed, and vertical tab. Parameters : Doesn't take any parameter, since it's not a function.
def find_whitespace_runs(str)
str.scan(/((\s+)|(\S+))/).map { |full, ws, nws|
{ :k => nws ? 1 : 0, :s => full }
}
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With