Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to grep for a URL in a file?

Tags:

regex

grep

For example, I have a huge HTML file that contains img URL: http://ex.example.com/hIh39j+ud9wr4/Uusfh.jpeg

I want to get this URL, assuming it's the only url in the entire file.

cat file.html | grep -o 'http://ex[a-zA-Z.-]*/[a-zA-Z.-]*/[a-zA-Z.,-]*'

This works only if the URL doesn't have the plus signs.

How do I make work for + signs as well?

like image 687
Leonardo DaVintik Avatar asked Nov 28 '12 18:11

Leonardo DaVintik


People also ask

How do I use grep to search inside files?

To use grep to search for words in a file, type grep, the word or words you want to search for, the files you want to look in, and press <Enter>. If you want to look for more than one word, you need to put ``double quotes'' around the words.

Can you use regex with grep?

Escaping Meta-CharactersSearching through files and directories with grep regex helps refine the search output to your specific use case.

Can I grep a string?

The grep command can search for a string in groups of files. When it finds a pattern that matches in more than one file, it prints the name of the file, followed by a colon, then the line matching the pattern.

Can you use grep to search a directory?

Grep is a pattern matching command that we can use to search inside files and directories for specific text. Grep is commonly used with the output of one command, piped to be the input of the grep command.


2 Answers

You missed the character class 0-9 (also useless use of cat):

grep -o 'http://ex[a-zA-Z.-]*/[a-zA-Z0-9+-]*/[a-zA-Z0-9.,-+]*' file.html

Slight improvement, use -i for case insensitivity and only match images .jpg or .jpeg.

grep -io 'http://ex[a-z.-]*/[a-z0-9+-]*/[a-z0-9.,-+]*[.jpe?g]' file.html

Or how about just:

grep -io 'http://ex.example.*[.jpe?g]' file.html
like image 100
Chris Seymour Avatar answered Sep 29 '22 09:09

Chris Seymour


The following fixes your regular expression for this specific case (including numbers and plus-signs):

http://ex[a-zA-Z.-]*/[a-zA-Z0-9.+-]*/[a-zA-Z0-9.+-]*

Demonstration:

echo "For example, I have a huge HTML file that contains img URL: http://ex.example.com/hIh39j+ud9wr4/Uusfh.jpeg"

I want to get this URL, assuming it's the only url in the entire file.

cat file.html | grep -o 'http://ex[a-zA-Z.-]*/[a-zA-Z.-]*/[a-zA-Z.,-]*'

This works only if the URL doesn't have the plus signs. How do I make work for + signs as well?

cat file.html| grep -o 'http://ex[a-zA-Z.-]*/[a-zA-Z0-9.+-]*/[a-zA-Z0-9.+-]*'

output:

http://ex.example.com/hIh39j+ud9wr4/Uusfh.jpeg

This does not extract all valid URLs. There are plenty of other answers on this site about URL matching.

like image 42
Johnsyweb Avatar answered Sep 29 '22 10:09

Johnsyweb