Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex - find all links in a tweet

My regex is poor and letting me down so some help would be great here.

All I want to do is return all the links which appear in a tweet (just a string) - Some examples are:

"Great summary http://mytest.com/blog/post.html (#test)

"http://mytest.com/blog/post.html (#test)

"post: http://mytest.com/blog/post.html"

It should also support multiple links like: "read http://mytest.com/blog/post.html and http://mytest.com/blog/post_two.html"

Any help would be great!

Thanks

Ben

like image 823
Ben Hall Avatar asked Nov 26 '25 01:11

Ben Hall


2 Answers

Try this one:

/\bhttps?:\/\/\S+\b/

Update:

To catch links beginning with "www." too (no "http://" prefix), you could try this:

/\b(?:https?:\/\/|www\.)\S+\b/

like image 90
Asaph Avatar answered Nov 27 '25 18:11

Asaph


Here's a code snippet from a site I wrote that parses a twitter feed. It parses links, hash tags, and twitter usernames. So far it's worked fine. I know it's not Ruby, but the regex should be helpful.

if(tweetStream[i] != null)
                    {
                        var str = tweetStream[i].Text;
                        var re = new Regex(@"http(s)?:\/\/\S+");
                        MatchCollection mc = re.Matches(tweetStream[i].Text);

                        foreach (Match m in mc)
                        {
                            str = str.Replace(m.Value, "<a href='" + m.Value + "' target='_blank'>" + m.Value + "</a>");
                        }
                        re = new Regex(@"(@)(\w+)");
                        mc = re.Matches(tweetStream[i].Text);
                        foreach (Match m in mc)
                        {
                            str = str.Replace(m.Value, "<a href='http://twitter.com/" + m.Value.Replace("@",string.Empty) + "' target='_blank'>" + m.Value + "</a>");
                        }
                        re = new Regex(@"(#)(\w+)");
                        mc = re.Matches(tweetStream[i].Text);
                        foreach (Match m in mc)
                        {
                            str = str.Replace(m.Value, "<a href='http://twitter.com/#search?q=" + m.Value.Replace("#", "%23") + "' target='_blank'>" + m.Value + "</a>");
                        }
                        tweets += string1 + "<div>" + str + "</div>" + string2;
                    }