Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a string containing both delimiter and the escaped delimiter?

Tags:

regex

ruby

My string delimiter is ;. Delimiter is escaped in the string as \;. E.g.,

irb(main):018:0> s = "a;b;;d\\;e"
=> "a;b;;d\\;e"
irb(main):019:0> s.split(';')
=> ["a", "b", "", "d\\", "e"]

Could someone suggest me regex so the output of split would be ["a", "b", "", "d\\;e"]? I'm using Ruby 1.8.7

like image 668
Say No To Censorship Avatar asked Feb 21 '23 02:02

Say No To Censorship


2 Answers

1.8.7 doesn't have negative lookbehind without Oniguruma (which may be compiled in).

1.9.3; yay:

> s = "a;b;c\\;d"
=> "a;b;c\\;d"
> s.split /(?<!\\);/
=> ["a", "b", "c\\;d"]

1.8.7 with Oniguruma doesn't offer a trivial split, but you can get match offsets and pull apart the substrings that way. I assume there's a better way to do this I'm not remembering:

> require 'oniguruma'
> re = Oniguruma::ORegexp.new "(?<!\\\\);"
> s = "hello;there\\;nope;yestho"
> re.match_all s
=> [#<MatchData ";">, #<MatchData ";">]
> mds = re.match_all s
=> [#<MatchData ";">, #<MatchData ";">]
> mds.collect {|md| md.offset}
=> [[5, 6], [17, 18]]

Other options include:

  • Splitting on ; and post-processing the results looking for trailing \\, or
  • Do a char-by-char loop and maintain some simple state and just split manually.
like image 116
Dave Newton Avatar answered Feb 22 '23 16:02

Dave Newton


As @dave-newton answered, you could use negative lookbehind, but that isn't supported in 1.8. An alternative that will work in both 1.8 and 1.9, is to use String#scan instead of split, with a pattern accepting not (semicolon or backslash) or anychar prefixed by backlash:

$ irb
>> RUBY_VERSION
=> "1.8.7"
>> s = "a;b;c\\;d"
=> "a;b;c\\;d"
s.scan /(?:[^;\\]|\\.)+/
=> ["a", "b", "c\\;d"]
like image 45
dbenhur Avatar answered Feb 22 '23 15:02

dbenhur