Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract date between single quotes in ruby

Tags:

regex

ruby

I have a string like this:

ticket:1 priority:5 delay:'2019-08-31 02:53:27.720422' delay:'2019-08-30 00:04:10.681242'

I successfully extracted ticket and priority but failed on delay.

What I want is to extract delays as array so output will be like this:

#delays =>
[
  "delay:'2019-08-31 02:53:27.720422'",
  "delay:'2019-08-30 00:04:10.681242'"
]

What I've tried so far?

str = "ticket:1 priority:5 delay:'2019-08-31 02:53:27.720422' delay:'2019-08-30 00:04:10.681242'"
delays = str.scan(/delay:\w+(?:'\w+)*/).flatten

How can i extract them in my case? Note that, there is no guarantee that date format will be like in examples. Date format can be anything. So we should focus on strings between single quotes.


If possible result can be like this (so that i dont have to extract date again.):

#delays =>
[
  "2019-08-31 02:53:27.720422",
  "2019-08-30 00:04:10.681242"
]
like image 572
Dennis Avatar asked Aug 31 '19 03:08

Dennis


2 Answers

This expression might be close to what you have in mind:

\bdelay\s*:\s*['][^']*[']

In case you had other chars such as " for the delay values, it would go in the char class:

\bdelay\s*:\s*['"][^'"]*['"]

or:

\bdelay\s*:\s*'(\d{4}-\d{1,2}-\d{1,2})\s*([^']*)'

Demo 2

or:

\bdelay\s*:\s*'(\d{4}-\d{1,2}-\d{1,2}\s*[^']*)'

Demo 3

or more simplified:

\bdelay\s*:\s*'([^']*)'

Test

re = /\bdelay\s*:\s*'([^']*)'/
str = 'ticket:1 priority:5 delay:\'2019-08-31 02:53:27.720422\' delay:\'2019-08-30 00:04:10.681242\''

str.scan(re) do |match|
    puts match.to_s
end

Output

["2019-08-31 02:53:27.720422"]
["2019-08-30 00:04:10.681242"]

If you wish to explore/simplify/modify the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


like image 150
Emma Avatar answered Nov 13 '22 00:11

Emma


This is a suggestion for how you might extract all values of interest, not just the values for "delay". It permits any number of instances of "delay:'..." in the string.

str = "ticket:1 priority:5 delay:'2019-08-31 02:53:27.720422' delay:'2019-08-30 00:04:10.681242"

str.delete("'").
    split(/ +(?=ticket|priority|delay)/).
    each_with_object({}) do |s,h|
      key, value = s.split(':', 2)
      case key
      when 'delay'
        (h[key] ||= []) << value
      else
        h[key] = value
      end
    end
  #=> {"ticket"=>"1", "priority"=>"5",
  #    "delay"=>["2019-08-31 02:53:27.720422", "2019-08-30 00:04:10.681242"]}

The regular expression that is String#split's argument reads, "match one or more spaces followed immediately by the string "ticket", "priority" or "delay", the expression

(?=ticket|priority|delay)

being a positive lookahead.

The steps are as follows.

a = str.delete("'")
  #=> "ticket:1 priority:5 delay:2019-08-31 02:53:27.720422 delay:2019-08-30 00:04:10.681242"

b = a.split(/ +(?=ticket|priority|delay)/)
  #=> ["ticket:1", "priority:5", "delay:2019-08-31 02:53:27.720422",
  #    "delay:2019-08-30 00:04:10.681242"] 
c = b.each_with_object({}) do |s,h|
      key, value = s.split(':', 2)
      case key
      when 'delay'
        (h[key] ||= []) << value
      else
        h[key] = value
       end
     end
  #=> {"ticket"=>"1", "priority"=>"5",
  #    "delay"=>["2019-08-31 02:53:27.720422", "2019-08-30 00:04:10.681242"]}

Let's examine more closely the calculation of c.

enum = b.each_with_object({})
  #=> #<Enumerator: ["ticket:1", "priority:5", "delay:2019-08-31 02:53:27.720422",
  #      "delay:2019-08-30 00:04:10.681242"]:each_with_object({})>

The first value is generated by this enumerator and passed to the block, and the two block variables are assigned these values using array decompostion.

 s, h = enum.next
   #=> ["ticket:1", {}] 
 s #=> "ticket:1" 
 h #=> {} 

The block calculation is then performed.

key, value = s.split(':', 2)
  #=> ["ticket", "1"] 
key
  #=> "ticket" 
value
  #=> "1" 

case else applies, so

h[key] = value
  #=> h["ticket"] = 1
h #=> {"ticket"=>"1"} 

The next element is generated by enum, the block variables are assigned values and block calculation is performed.

s, h = enum.next
  #=> ["priority:5", {"ticket"=>"1"}] 
key, value = s.split(':', 2)
  #=> ["priority", "5"] 

case else again applies, so we execute

h[key] = value
  #=> h["priority"] = "5" 
h #=> {"ticket"=>"1", "priority"=>"5"} 

Next,

s, h = enum.next
  #=> ["delay:2019-08-31 02:53:27.720422", {"ticket"=>"1", "priority"=>"5"}] 
key, value = s.split(':', 2)
  #=> ["delay", "2019-08-31 02:53:27.720422"] 

case "delay" now applies, so we compute

(h[key] ||= []) << value
  #=> h[key] = (h[key] || []) << value
  #=> h["delay"] = (h["delay"] || []) << "2019-08-31 02:53:27.720422"
  #=> h["delay"] = (nil || []) << "2019-08-31 02:53:27.720422" 
  #=> h["delay"] = [] << "2019-08-31 02:53:27.720422
  #=> h["delay"] = ["2019-08-31 02:53:27.720422] 
h #=> {"ticket"=>"1", "priority"=>"5", "delay"=>["2019-08-31 02:53:27.720422"]}

Lastly,

s, h = enum.next
  #=> ["delay:2019-08-30 00:04:10.681242",
  #    {"ticket"=>"1", "priority"=>"5", "delay"=>["2019-08-31 02:53:27.720422"]}] 
key, value = s.split(':', 2)
  #=> ["delay", "2019-08-30 00:04:10.681242"] 
(h[key] ||= []) << value
  #=> ["2019-08-31 02:53:27.720422", "2019-08-30 00:04:10.681242"] 
h #=> {"ticket"=>"1", "priority"=>"5",
  #    "delay"=>["2019-08-31 02:53:27.720422", "2019-08-30 00:04:10.681242"]} 

In this last step, unlike the previous one,

h[key] ||= []
  #=> ["2019-08-31 02:53:27.720422"] ||= []
  #=> ["2019-08-31 02:53:27.720422"]
like image 2
Cary Swoveland Avatar answered Nov 13 '22 01:11

Cary Swoveland