Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression avoid unnecessary backtracking in Java

Tags:

java

regex

Hello I am very new in the Regex world. I would like to extract the timestamp, location and the "id_str" field in my test string in Java.

20110302140010915|{"user":{"is_translator":false,"show_all_inline_media":false,"following":null,"geo_enabled":true,"profile_background_image_url":"http:\/\/a3.twimg.com\/a\/1298918947\/images\/themes\/theme1\/bg.png","listed_count":0,"favourites_count":2,"verified":false,"time_zone":"Mountain Time (US & Canada)","profile_text_color":"333333","contributors_enabled":false,"statuses_count":152,"profile_sidebar_fill_color":"DDEEF6","id_str":"207356721","profile_background_tile":false,"friends_count":14,"followers_count":13,"created_at":"Mon Oct 25 04:05:43 +0000 2010","description":null,"profile_link_color":"0084B4","location":"WaKeeney, KS","profile_sidebar_border_color":"C0DEED",

I have tried this

(\d*).*?"id_str":"(\d*)",.*"location":"([^"]*)"

It has a lot of backtrack if I used the lazy quantifier .*? (3000 steps in regexbuddy), but the number of characters between the anchor "id_str" and "location" is not always the same. Also, it could be catastrophic if no location is found in the string.

How can I avoid 1) Unnecessary backtracking?

and

2) Faster to find non-match string?

Thanks.

like image 902
Seen Avatar asked Feb 16 '23 03:02

Seen


1 Answers

This looks like JSON and trust me it's pretty easy to parse it this way.

String[] input = inputStr.split("|", 2);
System.out.println("Timestamp: " + input[0]); // 20110302140010915

JSONObject user = new JSONObject(input[1]).getJSONObject("user");

System.out.println ("ID: " + user.getString("id_str")); // 207356721
System.out.println ("Location: " + user.getString("location")); // WaKeeney, KS

Reference:
JSON Java API docs

like image 93
Ravi K Thapliyal Avatar answered Mar 08 '23 10:03

Ravi K Thapliyal