Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove full url's from text file using unix awk/sed/grep

Tags:

grep

bash

unix

sed

awk

I have a text file that in the form of tweets and I am having issues removing the full url's. An example of the textfile:

index.html:

this is a tweet that has info. http://google.com
this is a tweet that has an image. pic.twitter.com/a2y4H1b2Jq

I would like to create a new file that only has:

this is a tweet that has info.
this is a tweet that has an image.

Right now I am working with grep and I have

grep -oP "http://\K[^']+" final.txt

Thanks!

like image 338
Michael Vieth Avatar asked Sep 17 '25 08:09

Michael Vieth


2 Answers

sed 's/http[^ ]*//g' YourFile  

[^ ]* is catching all characters which are not blank

like image 104
josifoski Avatar answered Sep 20 '25 02:09

josifoski


Depends how restrictive you want it to be.

Full URLs that start with HTTP and have separators around:

sed -e 's|\bhttp[^ ]*\.[^ ]*\b||g' test.html

Anything with a dot that has any separators around:

sed -e 's|\b[^ ]*\.[^ ]*\b||g' test.html
like image 24
nullman Avatar answered Sep 20 '25 02:09

nullman