Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Double quotes in Regular expression

Tags:

java

regex

How can I get a string inside double quotes using regular expression?

I have the following string:

<img src="http://yahoo.com/img1.jpg" alt="">

I want to get the string http://yahoo.com/img1.jpg alt="" outside. How can I do this using regular expression?

like image 763
Ammu Avatar asked Jun 15 '11 06:06

Ammu


2 Answers

I don't know why you want the alt tag as well, but this regexp does what you want: Group 1 is the url and group 2 is the alt tag. I would possibly modify the regexp a bit if there can be several spaces between img and src, and if there can be spaces around '='

Pattern p = Pattern.compile("<img src=\"([^\"]*)\" (alt=\"[^\"]*\")>");
Matcher m = 
    p.matcher("<img src=\"http://yahoo.com/img1.jpg\" alt=\"\"> " + 
    "<img src=\"http://yahoo.com/img2.jpg\" alt=\"\">");

while (m.find()) {
    System.out.println(m.group(1) + "  " + m.group(2));
}

Output:

http://yahoo.com/img1.jpg  alt=""
http://yahoo.com/img2.jpg  alt=""
like image 198
Kaj Avatar answered Oct 04 '22 17:10

Kaj


You can do it like this:

Pattern p = Pattern.compile("<img src=\"(.*?)\".*?>");
Matcher m = p.matcher("<img src=\"http://yahoo.com/img1.jpg\" alt=\"\">");
if (m.find())
  System.out.println(m.group(1));

However, if you're parsing HTML consider using some library: regex are not a good idea to parse HTML. I had good experiences with jsoup: here's an example:

String fragment = "<img src=\"http://yahoo.com/img1.jpg\" alt=\"\">";
Document doc = Jsoup.parseBodyFragment(fragment);
Element img = doc.select("img").first();
String src = img.attr("src");
System.out.println(src);
like image 27
MarcoS Avatar answered Oct 04 '22 17:10

MarcoS