Remove full url's from text file using unix awk/sed/grep

Question

I have a text file that in the form of tweets and I am having issues removing the full url's. An example of the textfile:

index.html:

this is a tweet that has info. http://google.com
this is a tweet that has an image. pic.twitter.com/a2y4H1b2Jq

I would like to create a new file that only has:

this is a tweet that has info.
this is a tweet that has an image.

Right now I am working with grep and I have

grep -oP "http://\K[^']+" final.txt

Thanks!

josifoski · Accepted Answer

sed 's/http[^ ]*//g' YourFile

[^ ]* is catching all characters which are not blank

nullman · Answer

Depends how restrictive you want it to be.

Full URLs that start with HTTP and have separators around:

sed -e 's|\bhttp[^ ]*\.[^ ]*\b||g' test.html

Anything with a dot that has any separators around:

sed -e 's|\b[^ ]*\.[^ ]*\b||g' test.html

Donate For Us