Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Javascript - Scans Text Corresponding to a Certain Label

I have the text below:

This is a code update

* Official Name:  Noner


* Pub: https://content.upcodes.co/viewer/washington/wa-mechanical-code-2021

* Agency:  Agency Ni

* Reference: 

https://web.archive.org/web/20230226234118/https://lawfilesext.leg.wa.gov/law/wsr/agency/BuildingCodeCouncil.htm

https://web.archive.org/web/20230303022030/https://lawfilesext.leg.wa.gov/law/wsr/2023/02/23-02-055.htm (#1)

* Citation: WAC 51-52 / WSR 23-02-055

* Draft Doc Title: 

 WSR 23-02-055 (#1)

* Draft Source Doc: https://web.archive.org/web/20230303022030/https://lawfilesext.leg.wa.gov/law/wsr/2023/02/23-02-055.htm (#1)

* Draft Drive: https://drive.google.com/file/d/1pYmwQS3t-ZX-Vyg9yBabtIpXZ7By2G6f/view?usp=share_link ( #1)

* Final Doc Title: 

   IECC Com Update(#1)

   IECC Res Update (#2)

   IECC Res Update (#3)

* Final Source Doc: 
  https://web.archive.org/web/20230303022130/https://apps.leg.wa.gov/wac/default.aspx?cite=51-52&full=true&pdf=true (#1)
 https://web.archive.org/web/20230303022030/https://lawfilesext.leg.wa.gov/law/wsr/2023/02/23-02-055.htm (#2)

* Final Drive: https://web.archive.org/web/20230303022130/https://apps.leg.wa.gov/wac/default.aspx?cite=51-52&full=true&pdf=true (#1)

https://web.archive.org/web/2023030302fdfdfg2130/https://apps.legfdg.gov/wac/default.aspx?cite=51-52&fdsfullfdsf=true&pfdsfdf=true  (#2)

* Effective Date: January 4, 2023

I want to extract the information corresponding to the tag 'Reference:' but the code below only gives me one line. I want to scan all text until it encounters the asterisk symbol.

//Extract Reference    
var reference = description.search("Reference:");
if(reference != -1){
  reference = description.match(/(?<=^\* Reference\s*:)[\s]*[\n]*[^\n\r]*/m);  
  reference  = reference?.[0].trim();   
}else{
  reference = '';
}
console.log('Reference: ' + reference);

Expected Output:

https://web.archive.org/web/20230226234118/https://lawfilesext.leg.wa.gov/law/wsr/agency/BuildingCodeCouncil.htm

https://web.archive.org/web/20230303022030/https://lawfilesext.leg.wa.gov/law/wsr/2023/02/23-02-055.htm (#1)
like image 820
alyssaeliyah Avatar asked Jun 27 '26 23:06

alyssaeliyah


1 Answers

I decided to follow @Nick's idea not making any assumption about the 'subject' string.

I produced two lenient approaches in the sense that they work:

  • when there's no Reference item (returning an empty string),
  • when the Reference item has an empty content,
  • and when the Reference item content is at the end of the string (thus not followed by an other item).

The first works in all cases whatever the content:

let ref_pat = /^\* Reference:\s*(.*\S(?:\s+.*\S)*?)??\s*(?:^\*|(?![\s\S]))/m;
let reference = description.match(ref_pat)?.[1] ?? '';

A second more efficient pattern is possible if you assume the content doesn't contain asterisk characters:

let ref_pat = /^\* Reference:\s*([^*]*[^*\s])/m;
let reference = description.match(ref_pat)?.[1] ?? '';

This is the only break, but this one is from far more simple.

Whatever the one you choose, the result is already trimmed.

like image 182
Casimir et Hippolyte Avatar answered Jun 29 '26 11:06

Casimir et Hippolyte



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!