I want to perform the title-named action under linux command-line(several ca bash script will also do). the command I tried is: <pre class="prettyprint"><code>sed 's/href="([^"])"/$1/g' page.html > list.lst </code></pre> but obviously it failed. To be precise, here is my input: <pre class="prettyprint"><code><link rel="stylesheet" type="text/css" href="style/css/colors.css" /> <link rel="stylesheet" type="text/css" href="style/css/global.css" /> <link rel="stylesheet" type="text/css" href="style/css/icons.css" /> </code></pre> the output I want would be a comma-separated or space-separated list of all matches in the input file: <pre class="prettyprint"><code>style/css/colors.css,style/css/global.css,style/css/icons.css </code></pre> I think I got the right expression: href="([^"]*)" but I have no clue how to perform this. sed would do a search/replace which is not exactly what I want.( to the contrary, I only need to keep matches and throw the rest away, and not to replace them )

<pre class="prettyprint"><code>grep href page.html | sed 's/^.*href="$[^"]*$".*$/\1/' | xargs | sed 's/ /,/g' </code></pre> This will extract all the lines that contain <code>href</code> in them and will only get the first <code>href</code> on each line. Also, refer to this post about parsing HTML with regular expressions.

extract matches of a regex capturing group from a file

Tags:

regex

linux

command-line

I want to perform the title-named action under linux command-line(several ca bash script will also do). the command I tried is:

sed 's/href="([^"])"/$1/g' page.html > list.lst

but obviously it failed.

To be precise, here is my input:

<link rel="stylesheet" type="text/css" href="style/css/colors.css" />
<link rel="stylesheet" type="text/css" href="style/css/global.css" />
<link rel="stylesheet" type="text/css" href="style/css/icons.css" />

the output I want would be a comma-separated or space-separated list of all matches in the input file:

style/css/colors.css,style/css/global.css,style/css/icons.css

I think I got the right expression: href="([^"]*)"

but I have no clue how to perform this. sed would do a search/replace which is not exactly what I want.( to the contrary, I only need to keep matches and throw the rest away, and not to replace them )

786

asked Jul 26 '11 14:07

BiAiB

1 Answers

grep href page.html | sed 's/^.*href="\([^"]*\)".*$/\1/' | xargs | sed 's/ /,/g'

This will extract all the lines that contain href in them and will only get the first href on each line. Also, refer to this post about parsing HTML with regular expressions.

151

answered Oct 15 '22 08:10

rid

Related questions
                            
                                Weird behaviour of ruby regex in rails with utf8 char
                            
                                Regex for matching quotes and single quotes
                            
                                Characters classes in ranges - vim
                            
                                Efficient (basic) regular expression implementation for streaming data
                            
                                Find by Text and Replace in HTML BeautifulSoup
                            
                                append values with both single and double quotes to textbox
                            
                                Pattern.split slower than String.split
                            
                                Python regex pattern max length in re.compile?
                            
                                Check if a list has one or more strings that match a regex
                            
                                Split string to get an array of digits only (escaping white & empty spaces)
                            
                                Regex to match a digit not followed by a dot(".")
                            
                                How do I split a string using a Rust regex and keep the delimiters?
                            
                                How can I carry out math functions in the Ant 'ReplaceRegExp' task?
                            
                                What characters are allowed in a Google App Engine Key?
                            
                                How would I validate string length using DataAnnotations in asp.net mvc?
                            
                                Case sensitive and insensitive in the same pattern
                            
                                accent insensitive regex
                            
                                Find all but the first occurrence of a character with REGEX
                            
                                Arabic Problem Replace أً with just ا
                            
                                How do you capture and reuse a match with Java regex?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With