I have a text file that in the form of tweets and I am having issues removing the full url's. An example of the textfile:
index.html
:
this is a tweet that has info. http://google.com
this is a tweet that has an image. pic.twitter.com/a2y4H1b2Jq
I would like to create a new file that only has:
this is a tweet that has info.
this is a tweet that has an image.
Right now I am working with grep and I have
grep -oP "http://\K[^']+" final.txt
Thanks!
sed 's/http[^ ]*//g' YourFile
[^ ]* is catching all characters which are not blank
Depends how restrictive you want it to be.
Full URLs that start with HTTP and have separators around:
sed -e 's|\bhttp[^ ]*\.[^ ]*\b||g' test.html
Anything with a dot that has any separators around:
sed -e 's|\b[^ ]*\.[^ ]*\b||g' test.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With