Consider the following strings: 1) Scheme ID: abc-456-hu5t10 (High priority) ***** 2) Scheme ID: frt-78f-hj542w (Balanced) 3) Scheme ID: 23f-f974-nm54w (super formula run) ***** and so on in the above format - the parts in bold are changes across the strings. ==> Imagine I've many strings of format Shown above. I want to pick 3 substrings (As shown in BOLD below) from the each of the above strings. <ul> <li>1st substring containing the alphanumeric value (in eg above it's "abc-456-hu5t10")</li> <li>2nd substring containing the word (in eg above it's "High priority")</li> <li>3rd substring containing * (<code>IF</code> * is present at the end of the string <code>ELSE</code> leave it )</li> </ul> How do I pick these 3 substrings from each string shown above? I know it can be done using regular expressions in Perl... Can you help with this?

You could use a regular expression such as the following: <pre class="prettyprint"><code>/([-a-z0-9]+)\s*$(.*?)$\s*(\*)?/ </code></pre> So for example: <pre class="prettyprint"><code>$s = "abc-456-hu5t10 (High priority) *"; $s =~ /([-a-z0-9]+)\s*$(.*?)$\s*(\*)?/; print "$1\n$2\n$3\n"; </code></pre> prints <pre class="prettyprint">abc-456-hu5t10 High priority * </pre>

How can I extract substrings from a string in Perl?

4 Answers

You could do something like this:

my $data = <<END;
1) Scheme ID: abc-456-hu5t10 (High priority) *
2) Scheme ID: frt-78f-hj542w (Balanced)
3) Scheme ID: 23f-f974-nm54w (super formula run) *
END

foreach (split(/\n/,$data)) {
  $_ =~ /Scheme ID: ([a-z0-9-]+)\s+\(([^)]+)\)\s*(\*)?/ || next;
  my ($id,$word,$star) = ($1,$2,$3);
  print "$id $word $star\n";
}

The key thing is the Regular expression:

Scheme ID: ([a-z0-9-]+)\s+\(([^)]+)\)\s*(\*)?

Which breaks up as follows.

The fixed String "Scheme ID: ":

Scheme ID:

Followed by one or more of the characters a-z, 0-9 or -. We use the brackets to capture it as $1:

([a-z0-9-]+)

Followed by one or more whitespace characters:

\s+

Followed by an opening bracket (which we escape) followed by any number of characters which aren't a close bracket, and then a closing bracket (escaped). We use unescaped brackets to capture the words as $2:

\(([^)]+)\)

Followed by some spaces any maybe a *, captured as $3:

\s*(\*)?

159

answered Sep 20 '22 22:09

Dave Webb

You could use a regular expression such as the following:

/([-a-z0-9]+)\s*\((.*?)\)\s*(\*)?/

So for example:

$s = "abc-456-hu5t10 (High priority) *";
$s =~ /([-a-z0-9]+)\s*\((.*?)\)\s*(\*)?/;
print "$1\n$2\n$3\n";

prints

abc-456-hu5t10
High priority
*

answered Sep 22 '22 22:09

Greg Hewgill

(\S*)\s*\((.*?)\)\s*(\*?)


(\S*)    picks up anything which is NOT whitespace
\s*      0 or more whitespace characters
\(       a literal open parenthesis
(.*?)    anything, non-greedy so stops on first occurrence of...
\)       a literal close parenthesis
\s*      0 or more whitespace characters
(\*?)    0 or 1 occurances of literal *

answered Sep 20 '22 22:09

Xetius

Well, a one liner here:

perl -lne 'm|Scheme ID:\s+(.*?)\s+\((.*?)\)\s?(\*)?|g&&print "$1:$2:$3"' file.txt

Expanded to a simple script to explain things a bit better:

#!/usr/bin/perl -ln              

#-w : warnings                   
#-l : print newline after every print                               
#-n : apply script body to stdin or files listed at commandline, dont print $_           

use strict; #always do this.     

my $regex = qr{  # precompile regex                                 
  Scheme\ ID:      # to match beginning of line.                      
  \s+              # 1 or more whitespace                             
  (.*?)            # Non greedy match of all characters up to         
  \s+              # 1 or more whitespace                             
  \(               # parenthesis literal                              
    (.*?)            # non-greedy match to the next                     
  \)               # closing literal parenthesis                      
  \s*              # 0 or more whitespace (trailing * is optional)    
  (\*)?            # 0 or 1 literal *s                                
}x;  #x switch allows whitespace in regex to allow documentation.   

#values trapped in $1 $2 $3, so do whatever you need to:            
#Perl lets you use any characters as delimiters, i like pipes because                    
#they reduce the amount of escaping when using file paths           
m|$regex| && print "$1 : $2 : $3";

#alternatively if(m|$regex|) {doOne($1); doTwo($2) ... }

Though if it were anything other than formatting, I would implement a main loop to handle files and flesh out the body of the script rather than rely ing on the commandline switches for the looping.

answered Sep 19 '22 22:09

liam

Related questions
                            
                                Regular expression for removing whitespaces
                            
                                Match any/all of multiple words in a string
                            
                                Remove <br>'s from the end of a string
                            
                                PHP: how to add trailing slash to absolute URL
                            
                                Regex for checking that at least 3 of 4 different character groups exist
                            
                                Regex - match everything without whitespace
                            
                                Regex Java String Split by Asterisk
                            
                                How do I search and replace across multiple lines with Perl?
                            
                                Regular expression in URL for Django slug
                            
                                Regex to match a word with at least one letter and any number of digits (no lookaround) [duplicate]
                            
                                How do I make part of a regular expression optional in Ruby?
                            
                                "eager" regexp matching
                            
                                Regex to match whole words that begin with $
                            
                                Regex - Find all matching words that that don't begin with a specific prefix
                            
                                Is there a better way to extract information from a string?
                            
                                Clean Python Regular Expressions
                            
                                Parse string with bash and extract number
                            
                                Regular Expression for dollar amount in JavaScript
                            
                                how to extract floating numbers from strings in javascript
                            
                                Parse SVG transform attribute with javascript

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I extract substrings from a string in Perl?

Tags:

string

regex

perl

stack_pointer is EXTINCT

People also ask