Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python regex match optional square brackets

I have the following strings:

1 "R J BRUCE & OTHERS V B J & W L A EDWARDS And Ors CA CA19/02 27 February 2003",     
2 "H v DIRECTOR OF PROCEEDINGS [2014] NZHC 1031 [16 May 2014]",  
3 '''GREGORY LANCASTER AND JOHN HENRY HUNTER V CULLEN INVESTMENTS LIMITED AND  
ERIC JOHN WATSON CA CA51/03 26 May 2003''' 

I am trying to find a regular expression which matches all of them. I don't know how to match optional square brackets around the date at the end of the string eg [16 May 2014].

casename = re.compile(r'(^[A-Z][A-Za-z\'\(\) ]+\b[v|V]\b[A-Za-z\'\(\) ]+(.*?)[ \[ ]\d+    \w+ \d\d\d\d[\] ])', re.S) 

The date regex at the end only matches cases with dates in square bracket but not the ones without.

Thank to everybody who answered. @Matt Clarkson what I am trying to match is a judicial decision 'handle' in a much larger text. There is a large variation within those handles, but they all start at the beginning of a line have 'v' for versus between the party names and a date at the end. Mostly the names of the parties are in capital but not exclusively. I am trying to have only one match per document and no false positives.

like image 401
user740875 Avatar asked Aug 26 '14 16:08

user740875


People also ask

How do you use square brackets in regex python?

[] - Square brackets Here, [abc] will match if the string you are trying to match contains any of the a , b or c . You can also specify a range of characters using - inside square brackets. [a-e] is the same as [abcde] . [1-4] is the same as [1234] .

How do you match square brackets in regex?

You can omit the first backslash. [[\]] will match either bracket. In some regex dialects (e.g. grep) you can omit the backslash before the ] if you place it immediately after the [ (because an empty character class would never be useful): [][] .

What is the meaning of [] in regex?

The [] construct in a regex is essentially shorthand for an | on all of the contents. For example [abc] matches a, b or c. Additionally the - character has special meaning inside of a [] . It provides a range construct. The regex [a-z] will match any letter a through z.


2 Answers

I got all of them to match using this (You'll need to add the case-insensitive flag):

(^[a-z][a-z\'&\(\) ]+\bv\b[a-z&\'\(\) ]+(?:.*?) \[?\d+ \w+ \d{4}\]?)

Regex Demo

Explanation:

  • ( Begin capture group
    • [a-z\'&\(\) ]+ Match one or more of the characters in this group
    • \b Match a word boundary
    • v Match the character 'v' literally
    • \b Match a word boundary
    • [a-z&\'\(\) ]+ Match one or more of the characters in this group
    • (?: Begin non-capturing group
      • .*? Match anything
    • ) End non-capturing group
    • \[?\d+ \w+ \d{4}\]? Match a date, optionally surrounded by brackets
  • ) End capture group
like image 104
RevanProdigalKnight Avatar answered Nov 01 '22 00:11

RevanProdigalKnight


How to make Square brackets optional, can be achieved like this:

[\[]* with the * it makes the opening [ optional.

A few recommendations if I may:

  • This \d\d\d\d could be also expressed like this as well \d{4}

  • [v|V] in regex what is inside the [] is already one or other | is not necessary [vV]

And here is what an online demo

like image 37
Dalorzo Avatar answered Oct 31 '22 23:10

Dalorzo