Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract pattern between a substring and first occurrence of numeric in a string

Following is the content of a file:

xxx_component1-1.0-2-2acd314.xc-linux-x86-64-Release-devel.r
xxx_component2-3.0-1-fg3sdhd.xc-linux-x86-64-Release-devel.r
xxx_component3-1.0-2-3gsjcgd.xc-linux-x86-64-Release-devel.r
xxx_component4-0.0-2-2acd314.xc-linux-x86-64-Release-devel.r

I want to extract component names component1 component2 etc.

This is what I tried:

for line in `sed -n -e '/^xxx-/p' $file`
do
    comp=`echo $line | sed  -e '/xxx-/,/[0-9]/p'`
    echo "comp - $comp"
done

I also tried sed -e 's/.*xxx-\(.*\)[^0-9].*/\1/'

This is based on some info on net. Please give me sed command and if possible also explain stepwise

Part 2. I also need to extract version number from the string. version number starts with digit and ends with . followed by xc-linux. As you can see to maintain the uniqueness its has random alphanumeric characters ( length is 7) as part of the version number.

For example : xxx_component1-1.0-2-2acd314.xc-linux-x86-64-Release-devel.r In this string the version number is : 1.0-2-2acd314

like image 977
user3662599 Avatar asked May 21 '14 21:05

user3662599


1 Answers

There are quite a few ways to extract the data. The simplest form would be grep.

GNU grep:

You can grab the required data using GNU grep with PCRE option -P:

$ cat file
xxx_component1-1.0-2-2acd314.xc-linux-x86-64-Release-devel.r
xxx_component2-3.0-1-fg3sdhd.xc-linux-x86-64-Release-devel.r
xxx_component3-1.0-2-3gsjcgd.xc-linux-x86-64-Release-devel.r
xxx_component4-0.0-2-2acd314.xc-linux-x86-64-Release-devel.r

$ grep -oP '(?<=_)[^-]*' file
component1
component2
component3
component4

Here we use negative look behind assertion tell to capture everything from _ to a - not incusive.


awk:

$ awk -F"[_-]" '{print $2}' file
component1
component2
component3
component4

Here we tell awk to use - and _ as delimiters and print the second column.


sed:

Having said that, you can also use sed to extract required data using group capture:

$ sed 's/.*_\([^-]*\)-.*/\1/' file
component1
component2
component3
component4

The regex states that match any character zero or more times up to an _. From that point onwards, capture everything until a - in a group. In the replacement part we just use the data captured in the group by calling it using back reference, that is \1.

like image 72
jaypal singh Avatar answered Nov 18 '22 02:11

jaypal singh