Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Want Regex to stop at first occurrence of "." and ";"

I am trying to extract sentence to from a paragraph, with pattern like

 Current. time is six thirty at Scotland. Past. time was five thirty at India; Current. time is five thirty at Scotland. Past. time was five thirty at Scotland. Current. time is five ten at Scotland.

When I Use Regex as

/current\..*scotland\./i

This matches to all string

Current. time is six thirty at Scotland. Past. time was six thirty at India; Current. time is five thirty at Scotland. Past. time was five thirty at Scotland. Current. time is five ten at Scotland.

Instead I want to stop at first occurrence of "." to all capture groups like

 Current. time is six thirty at Scotland.
 Current. time is five ten at Scotland. 

Similarly for text like

 Past. time was five thirty at India; Current. time is six thirty at Scotland. Past. time was five thirty at Scotland. Past. time was five ten at India;    

When I Use Regex Like

 /past\..*india\;/i

This matches will whole string

 Past. time was five thirty at India; Current. time is six thirty at Scotland. Past. time was five thirty at Scotland. Past. time was five ten at India; 

Here I want to capture all groups or first group like following, and How to stop at first occurrence of ";"

Past. time was five thirty at India; 
Past. time was five ten at India; 

How I can make regular expression to stop at "," or ";" with above examples?

like image 302
Pramod Shinde Avatar asked Jun 13 '14 11:06

Pramod Shinde


1 Answers

There are a few things you shouldn't really be doing with your regex, first off, as pointed out by Arnal Murali, you shouldn't be using a greedy regex but should use the lazy version:

/current\..*?scotland\./i

I think it is a general rule of regex to go for the lazy option first as it is more often what you want. Secondly, you don't really want to use . to match everything, since you do not want to allow this part of your regex to match either . or ; you can put those in a negative capture group to capture anything but them:

/current\.[^.]*?scotland\./i

and

/current\.[^;]*?india;/i

or cover both with:

/(current|past)\.[^.;]*?(india|scotland)[.;]/i

(obviously this might not be what you want to do, just including to demonstrate how to extend this)

This is also a good rule of thumb, if you're having trouble with a regex make any wildcards more specific (in this case changing from matching everything . to matching everything but . and ; with [^.;])

like image 109
Mike H-R Avatar answered Sep 27 '22 19:09

Mike H-R