For example, I have a huge HTML file that contains img URL: http://ex.example.com/hIh39j+ud9wr4/Uusfh.jpeg
I want to get this URL, assuming it's the only url in the entire file.
cat file.html | grep -o 'http://ex[a-zA-Z.-]*/[a-zA-Z.-]*/[a-zA-Z.,-]*'
This works only if the URL doesn't have the plus signs.
How do I make work for + signs as well?
To use grep to search for words in a file, type grep, the word or words you want to search for, the files you want to look in, and press <Enter>. If you want to look for more than one word, you need to put ``double quotes'' around the words.
Escaping Meta-CharactersSearching through files and directories with grep regex helps refine the search output to your specific use case.
The grep command can search for a string in groups of files. When it finds a pattern that matches in more than one file, it prints the name of the file, followed by a colon, then the line matching the pattern.
Grep is a pattern matching command that we can use to search inside files and directories for specific text. Grep is commonly used with the output of one command, piped to be the input of the grep command.
You missed the character class 0-9
(also useless use of cat):
grep -o 'http://ex[a-zA-Z.-]*/[a-zA-Z0-9+-]*/[a-zA-Z0-9.,-+]*' file.html
Slight improvement, use -i
for case insensitivity and only match images .jpg
or .jpeg
.
grep -io 'http://ex[a-z.-]*/[a-z0-9+-]*/[a-z0-9.,-+]*[.jpe?g]' file.html
Or how about just:
grep -io 'http://ex.example.*[.jpe?g]' file.html
The following fixes your regular expression for this specific case (including numbers and plus-signs):
http://ex[a-zA-Z.-]*/[a-zA-Z0-9.+-]*/[a-zA-Z0-9.+-]*
echo "For example, I have a huge HTML file that contains img URL: http://ex.example.com/hIh39j+ud9wr4/Uusfh.jpeg"
I want to get this URL, assuming it's the only url in the entire file.
cat file.html | grep -o 'http://ex[a-zA-Z.-]*/[a-zA-Z.-]*/[a-zA-Z.,-]*'
This works only if the URL doesn't have the plus signs. How do I make work for + signs as well?
cat file.html| grep -o 'http://ex[a-zA-Z.-]*/[a-zA-Z0-9.+-]*/[a-zA-Z0-9.+-]*'
output:
http://ex.example.com/hIh39j+ud9wr4/Uusfh.jpeg
This does not extract all valid URLs. There are plenty of other answers on this site about URL matching.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With