Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing quoted text in java

Is there an easy way to parse quoted text as a string to java? I have this lines like this to parse:

author="Tolkien, J.R.R." title="The Lord of the Rings"
publisher="George Allen & Unwin" year=1954 

and all I want is Tolkien, J.R.R.,The Lord of the Rings,George Allen & Unwin, 1954 as strings.

like image 921
david Avatar asked Aug 27 '11 02:08

david


2 Answers

You could either use a regex like

"(.+)"

It will match any character between quotes. In Java would be:

Pattern p = Pattern.compile("\\"(.+)\\"";
Matcher m = p.matcher("author=\"Tolkien, J.R.R.\"");
while(matcher.find()){
  System.out.println(m.group(1));      
}

Note that group(1) is used, this is the second match, the first one, group(0), is the full string with quotes

Offcourse you could also use a substring to select everything except the first and last char:

String quoted = "author=\"Tolkien, J.R.R.\"";
String unquoted;    
if(quoted.indexOf("\"") == 0 && quoted.lastIndexOf("\"")==quoted.length()-1){
    unquoted = quoted.substring(1, quoted.lenght()-1);
}else{
  unquoted = quoted;
}
like image 193
Benjamin Udink ten Cate Avatar answered Sep 28 '22 15:09

Benjamin Udink ten Cate


There are some fancy pattern regex nonsense things that fancy people and fancy programmers like to use.

I like to use String.split(). It's a simple function and does what you need it to do.

So if I have a String word: "hello" and I want to take out "hello", I can simply do this:

myStr = string.split("\"")[1];

This will cut the string into bits based on the quote marks.

If I want to be more specific, I can do

myStr = string.split("word: \"")[1].split("\"")[0];

That way I cut it with word: " and "

Of course, you run into problems if word: " is repeated twice, which is what patterns are for. I don't think you'll have to deal with that problem for your specific question.

Also, be cautious around characters like . and . Split uses regex, so those characters will trigger funny behavior. I think that "\\" = \ will escape those funny rules. Someone correct me if I'm wrong.

Best of luck!

like image 31
Ryan Amos Avatar answered Sep 28 '22 13:09

Ryan Amos