Run curl command on each line of a file and fetch data from result

Question

Suppose I have a file containing a list of links of webpages.

www.xyz.com/asdd
www.wer.com/asdas
www.asdas.com/asd
www.asd.com/asdas

I know that doing curl www.xyz.com/asdd will fetch me the html of that webpage. I want to fetch some data from that webpage.

So the scenario is use curl to hit all the links in the file one by one and extract some data from the webpage and store somewhere else. Any ideas or suggestions.

fedorqui 'SO stop harming' · Accepted Answer

As indicated in the comments, this will loop through your_file and curl each line:

while IFS= read -r line
do
   curl "$line"
done < your_file

To get the <title> of a page, you can grep something like this:

grep -iPo '(?<=<title>).*(?=</title>)' file

So all together you could do

while IFS= read -r line
do
   curl -s "$line" | grep -Po '(?<=<title>).*(?=</title>)'
done < your_file

Note curl -s is for silent mode. See an example with google page:

$ curl -s http://www.google.com | grep -Po '(?<=<title>).*(?=</title>)'
302 Moved

Orun · Answer

You can accomplish this in just one line with xargs. Let's say you have a file in the working directory with all your URLs (one per line) called sitemap

xargs -I{} curl -s {} <sitemap | grep title

This would extract any lines with the word "title" in it. To extract the title tags you'll want to change the grep a little. The -o flag ensures that only the grepped result is printed:

xargs -I{} curl -s {} <sitemap | grep -o "<title>.*</title>"

A couple of things to note:

If you want to extract certain data, you will need to \ escape characters.
- For HTML attributes for example, you should match single and double quotes, and escape them like [\"\']
Sometimes, depending on the character set, you may get some unusual curl output with special characters. If you detect this, you'll need to switch the encoding with a utility like iconv

Run curl command on each line of a file and fetch data from result

Tags:

regex

bash

curl

awk

aelor

Video Answer

2 Answers

fedorqui 'SO stop harming'

Orun

Recent Activity

Donate For Us

Run curl command on each line of a file and fetch data from result

Tags:

regex

bash

curl

awk

aelor

Video Answer

2 Answers

fedorqui 'SO stop harming'

Orun

Related questions

Recent Activity

Donate For Us