How would you design a regular expression to capture a legal citation? Here is a paragraph that shows a two typical legal citations:
We have insisted on strict scrutiny in every context, even for so-called “benign” racial classifications, such as race-conscious university admissions policies, see Grutter v. Bollinger, 539 U.S. 306, 326 (2003), race-based preferences in government contracts, see Adarand, supra, at 226, and race-based districting intended to improve minority representation, see Shaw v. Reno, 509 U.S. 630, 650 (1993).
A citation will either be preceded by a comma and whitespace, a period and whitespace, or a "signal" such as "see" or "see, e.g.," and whitespace. I'm having trouble figuring out how to accurately specify the start of the citation.
I am most familiar with Perl regular expressions but can understand examples from other languages as well.
In your example, you've preceded the citations with what the BlueBook deems a 'signal' (Rule 1.2 on page 54 of the nineteenth edition). Other signals include but are not limited to : e.g., accord, also, cf., compare, and, with, contra, and but. These can be combined in surprising and unexpected ways . . . See also, e.g. Watts v. United States, 394 U.S. 705 (1969) (per curiam). Of course, there are also citations that are not preceded by signals
Then you'll also want to handle case citations with unexpected case names :
See v. Seattle, 387 U.S. 541 (1967)
Others have attacked this particular problem by first identifying the reporter reference (i.e. 387 U.S. 541) with a regular expression like (\d+)\s(.+?)\s(\d+) and then trying to expand the range from there. Case citations can be arbitrarily complex so this path is not without its own pitfalls. Reporter references can also take on some interesting forms as per BlueBook rules:
Jones v. Smith, _ F.3d _ (2011)
For decisions which are not yet published for instance. Of course, authors will use variations of the above including (but not limited to) --- F.3d ---
This certainly isn't perfect, but without more examples to test against it's the best I can think of. Thanks to @Paul H. for extra signal words to add.
#!/usr/bin/perl
$search_text = <<EOD;
"We have insisted on strict scrutiny in every context, even for so-called “benign” racial classifications, such as race-conscious university admissions policies, see Grutter v. Bollinger, 539 U.S. 306, 326 (2003), race-based preferences in government contracts, see Adarand, supra, at 226, and race-based districting intended to improve minority representation, see Shaw v. Reno, 509 U.S. 630, 650 (1993)."
In your example, you've preceded the citations with what the BlueBook deems a 'signal' (Rule 1.2 on page 54 of the nineteenth edition). Other signals include but are not limited to : e.g., accord, also, cf., compare, and, with, contra, and but. These can be combined in surprising and unexpected ways . . . See also, e.g. Watts v. United States, 394 U.S. 705 (1969) (per curiam). Of course, there are also citations that are not preceded by signals
Then you'll also want to handle case citations with unexpected case names :
See v. Seattle, 387 U.S. 541 (1967)
Others have attacked this particular problem by first identifying the reporter reference (i.e. 387 U.S. 541) with a regular expression like (\d+)\s(.+?)\s(\d+) and then trying to expand the range from there. Case citations can be arbitrarily complex so this path is not without its own pitfalls. Reporter references can also take on some interesting forms as per BlueBook rules:
EOD
while ($search_text =~ m/(\, |\. |\; )?(see(\,|\.|\;)? |e\.g\.(\,|\.|\;)? |accord(\,|\.|\;)? |also(\,|\.|\;)? |cf\.(\,|\.|\;)? |compare(\,|\.|\;)? |with(\,|\.|\;)? |contra(\,|\.|\;)? |but(\,|\.|\;)? )+(.{0,100}\d+ \(\d{4}\))/g) {
print "$12\n";
}
while ($search_text =~ m/[\n\t]+(.{0,100}\d+ \(\d{4}\))/ig) {
print "$1\n";
}
Output is:
Grutter v. Bollinger, 539 U.S. 306, 326 (2003)
Shaw v. Reno, 509 U.S. 630, 650 (1993)
Watts v. United States, 394 U.S. 705 (1969)
See v. Seattle, 387 U.S. 541 (1967)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With