Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression for legal citation

Tags:

regex

How would you design a regular expression to capture a legal citation? Here is a paragraph that shows a two typical legal citations:

We have insisted on strict scrutiny in every context, even for so-called “benign” racial classifications, such as race-conscious university admissions policies, see Grutter v. Bollinger, 539 U.S. 306, 326 (2003), race-based preferences in government contracts, see Adarand, supra, at 226, and race-based districting intended to improve minority representation, see Shaw v. Reno, 509 U.S. 630, 650 (1993).

A citation will either be preceded by a comma and whitespace, a period and whitespace, or a "signal" such as "see" or "see, e.g.," and whitespace. I'm having trouble figuring out how to accurately specify the start of the citation.

I am most familiar with Perl regular expressions but can understand examples from other languages as well.

like image 976
Eric Truett Avatar asked Aug 14 '11 03:08

Eric Truett


2 Answers

In your example, you've preceded the citations with what the BlueBook deems a 'signal' (Rule 1.2 on page 54 of the nineteenth edition). Other signals include but are not limited to : e.g., accord, also, cf., compare, and, with, contra, and but. These can be combined in surprising and unexpected ways . . . See also, e.g. Watts v. United States, 394 U.S. 705 (1969) (per curiam). Of course, there are also citations that are not preceded by signals

Then you'll also want to handle case citations with unexpected case names :

See v. Seattle, 387 U.S. 541 (1967)

Others have attacked this particular problem by first identifying the reporter reference (i.e. 387 U.S. 541) with a regular expression like (\d+)\s(.+?)\s(\d+) and then trying to expand the range from there. Case citations can be arbitrarily complex so this path is not without its own pitfalls. Reporter references can also take on some interesting forms as per BlueBook rules:

Jones v. Smith, _ F.3d _ (2011)

For decisions which are not yet published for instance. Of course, authors will use variations of the above including (but not limited to) --- F.3d ---

like image 187
Paul H. Avatar answered Dec 08 '22 03:12

Paul H.


This certainly isn't perfect, but without more examples to test against it's the best I can think of. Thanks to @Paul H. for extra signal words to add.

#!/usr/bin/perl

$search_text = <<EOD;
"We have insisted on strict scrutiny in every context, even for so-called “benign” racial classifications, such as race-conscious university admissions policies, see Grutter v. Bollinger, 539 U.S. 306, 326 (2003), race-based preferences in government contracts, see Adarand, supra, at 226, and race-based districting intended to improve minority representation, see Shaw v. Reno, 509 U.S. 630, 650 (1993)."

In your example, you've preceded the citations with what the BlueBook deems a 'signal' (Rule 1.2 on page 54 of the nineteenth edition). Other signals include but are not limited to : e.g., accord, also, cf., compare, and, with, contra, and but. These can be combined in surprising and unexpected ways . . . See also, e.g. Watts v. United States, 394 U.S. 705 (1969) (per curiam). Of course, there are also citations that are not preceded by signals

Then you'll also want to handle case citations with unexpected case names :

See v. Seattle, 387 U.S. 541 (1967)

Others have attacked this particular problem by first identifying the reporter reference (i.e. 387 U.S. 541) with a regular expression like (\d+)\s(.+?)\s(\d+) and then trying to expand the range from there. Case citations can be arbitrarily complex so this path is not without its own pitfalls. Reporter references can also take on some interesting forms as per BlueBook rules:
EOD


while ($search_text =~ m/(\, |\. |\; )?(see(\,|\.|\;)? |e\.g\.(\,|\.|\;)? |accord(\,|\.|\;)? |also(\,|\.|\;)? |cf\.(\,|\.|\;)? |compare(\,|\.|\;)? |with(\,|\.|\;)? |contra(\,|\.|\;)? |but(\,|\.|\;)? )+(.{0,100}\d+ \(\d{4}\))/g) {
    print "$12\n";
}

while ($search_text =~ m/[\n\t]+(.{0,100}\d+ \(\d{4}\))/ig) {
    print "$1\n";
}

Output is:

Grutter v. Bollinger, 539 U.S. 306, 326 (2003)
Shaw v. Reno, 509 U.S. 630, 650 (1993)
Watts v. United States, 394 U.S. 705 (1969)
See v. Seattle, 387 U.S. 541 (1967)
like image 44
mikemxm Avatar answered Dec 08 '22 03:12

mikemxm