Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split JSON objects in array string using regex

Tags:

java

json

regex

I have a String in the following format:

[{"HostName":"taskmanager1","Rack":"/default-rack","State":"RUNNING","NodeId":"taskmanager1:45454","NodeHTTPAddress":"taskmanager1:8042","LastHealthUpdate":1519568501615,"HealthReport":"","NodeManagerVersion":"2.8.3","NumContainers":0,"UsedMemoryMB":0,"AvailableMemoryMB":1024},{"HostName":"datanode2","Rack":"/default-rack","State":"RUNNING","NodeId":"datanode2:45454","NodeHTTPAddress":"datanode2:8042","LastHealthUpdate":1519260876106,"HealthReport":"","NodeManagerVersion":"2.8.3","NumContainers":0,"UsedMemoryMB":0,"AvailableMemoryMB":1024},{"HostName":"taskmanager3","Rack":"/default-rack","State":"RUNNING","NodeId":"taskmanager3:45454","NodeHTTPAddress":"taskmanager3:8042","LastHealthUpdate":1519568502251,"HealthReport":"","NodeManagerVersion":"2.8.3","NumContainers":0,"UsedMemoryMB":0,"AvailableMemoryMB":1024},{"HostName":"datanode3","Rack":"/default-rack","State":"RUNNING","NodeId":"datanode3:45454","NodeHTTPAddress":"datanode3:8042","LastHealthUpdate":1519260871527,"HealthReport":"","NodeManagerVersion":"2.8.3","NumContainers":0,"UsedMemoryMB":0,"AvailableMemoryMB":1024},{"HostName":"taskmanager2","Rack":"/default-rack","State":"RUNNING","NodeId":"taskmanager2:45454","NodeHTTPAddress":"taskmanager2:8042","LastHealthUpdate":1519568502259,"HealthReport":"","NodeManagerVersion":"2.8.3","NumContainers":0,"UsedMemoryMB":0,"AvailableMemoryMB":1024},{"HostName":"datanode1","Rack":"/default-rack","State":"RUNNING","NodeId":"datanode1:45454","NodeHTTPAddress":"datanode1:8042","LastHealthUpdate":1519260875647,"HealthReport":"","NodeManagerVersion":"2.8.3","NumContainers":0,"UsedMemoryMB":0,"AvailableMemoryMB":1024}]

I want to split it into multiple (here 6) JSON format, but my pattern cannot split that as desired.

I want something like this:

{"HostName":"taskmanager1","Rack":"/default-rack","State":"RUNNING","NodeId":"taskmanager1:45454","NodeHTTPAddress":"taskmanager1:8042","LastHealthUpdate":1519568501615,"HealthReport":"","NodeManagerVersion":"2.8.3","NumContainers":0,"UsedMemoryMB":0,"AvailableMemoryMB":1024},
{"HostName":"datanode2","Rack":"/default-rack","State":"RUNNING","NodeId":"datanode2:45454","NodeHTTPAddress":"datanode2:8042","LastHealthUpdate":1519260876106,"HealthReport":"","NodeManagerVersion":"2.8.3","NumContainers":0,"UsedMemoryMB":0,"AvailableMemoryMB":1024},
{"HostName":"taskmanager3","Rack":"/default-rack","State":"RUNNING","NodeId":"taskmanager3:45454","NodeHTTPAddress":"taskmanager3:8042","LastHealthUpdate":1519568502251,"HealthReport":"","NodeManagerVersion":"2.8.3","NumContainers":0,"UsedMemoryMB":0,"AvailableMemoryMB":1024},
{"HostName":"datanode3","Rack":"/default-rack","State":"RUNNING","NodeId":"datanode3:45454","NodeHTTPAddress":"datanode3:8042","LastHealthUpdate":1519260871527,"HealthReport":"","NodeManagerVersion":"2.8.3","NumContainers":0,"UsedMemoryMB":0,"AvailableMemoryMB":1024}
,{"HostName":"taskmanager2","Rack":"/default-rack","State":"RUNNING","NodeId":"taskmanager2:45454","NodeHTTPAddress":"taskmanager2:8042","LastHealthUpdate":1519568502259,"HealthReport":"","NodeManagerVersion":"2.8.3","NumContainers":0,"UsedMemoryMB":0,"AvailableMemoryMB":1024},
{"HostName":"datanode1","Rack":"/default-rack","State":"RUNNING","NodeId":"datanode1:45454","NodeHTTPAddress":"datanode1:8042","LastHealthUpdate":1519260875647,"HealthReport":"","NodeManagerVersion":"2.8.3","NumContainers":0,"UsedMemoryMB":0,"AvailableMemoryMB":1024}

Using the code:

List<String> res = Arrays.asList(temp.replace('[', ' ').replace(']',' ').trim()).split(",");

It will be split for every , character and using the pattern split("},\\}") will remove } and { character, too.

How can I split that as desire to make Json objects?

Using the Java pattern (\\{.+}) will group whole string.

like image 782
Soheil Pourbafrani Avatar asked Dec 06 '25 04:12

Soheil Pourbafrani


2 Answers

You can parse the JSON as an array and treat the contents as individual strings. Here is sample code:

import org.json.JSONArray;

public class orgJson1Main {
    private static final String sample = "[{\"HostName\":\"taskmanager1\",\"Rack\":\"/default-rack\",\"State\":\"RUNNING\",\"NodeId\":\"taskmanager1:45454\",\"NodeHTTPAddress\":\"taskmanager1:8042\",\"LastHealthUpdate\":1519568501615,\"HealthReport\":\"\",\"NodeManagerVersion\":\"2.8.3\",\"NumContainers\":0,\"UsedMemoryMB\":0,\"AvailableMemoryMB\":1024},{\"HostName\":\"datanode2\",\"Rack\":\"/default-rack\",\"State\":\"RUNNING\",\"NodeId\":\"datanode2:45454\",\"NodeHTTPAddress\":\"datanode2:8042\",\"LastHealthUpdate\":1519260876106,\"HealthReport\":\"\",\"NodeManagerVersion\":\"2.8.3\",\"NumContainers\":0,\"UsedMemoryMB\":0,\"AvailableMemoryMB\":1024},{\"HostName\":\"taskmanager3\",\"Rack\":\"/default-rack\",\"State\":\"RUNNING\",\"NodeId\":\"taskmanager3:45454\",\"NodeHTTPAddress\":\"taskmanager3:8042\",\"LastHealthUpdate\":1519568502251,\"HealthReport\":\"\",\"NodeManagerVersion\":\"2.8.3\",\"NumContainers\":0,\"UsedMemoryMB\":0,\"AvailableMemoryMB\":1024},{\"HostName\":\"datanode3\",\"Rack\":\"/default-rack\",\"State\":\"RUNNING\",\"NodeId\":\"datanode3:45454\",\"NodeHTTPAddress\":\"datanode3:8042\",\"LastHealthUpdate\":1519260871527,\"HealthReport\":\"\",\"NodeManagerVersion\":\"2.8.3\",\"NumContainers\":0,\"UsedMemoryMB\":0,\"AvailableMemoryMB\":1024},{\"HostName\":\"taskmanager2\",\"Rack\":\"/default-rack\",\"State\":\"RUNNING\",\"NodeId\":\"taskmanager2:45454\",\"NodeHTTPAddress\":\"taskmanager2:8042\",\"LastHealthUpdate\":1519568502259,\"HealthReport\":\"\",\"NodeManagerVersion\":\"2.8.3\",\"NumContainers\":0,\"UsedMemoryMB\":0,\"AvailableMemoryMB\":1024},{\"HostName\":\"datanode1\",\"Rack\":\"/default-rack\",\"State\":\"RUNNING\",\"NodeId\":\"datanode1:45454\",\"NodeHTTPAddress\":\"datanode1:8042\",\"LastHealthUpdate\":1519260875647,\"HealthReport\":\"\",\"NodeManagerVersion\":\"2.8.3\",\"NumContainers\":0,\"UsedMemoryMB\":0,\"AvailableMemoryMB\":1024}]";

    public static void main(String[] args) {
        JSONArray array = new JSONArray(sample);
        for(int i=0; i < array.length(); i++){
            System.out.println(array.get(i));
        }
    }

}

OUTPUT:

{"NodeManagerVersion":"2.8.3","Rack":"/default-rack","LastHealthUpdate":1519568501615,"HealthReport":"","State":"RUNNING","AvailableMemoryMB":1024,"NodeId":"taskmanager1:45454","UsedMemoryMB":0,"NodeHTTPAddress":"taskmanager1:8042","HostName":"taskmanager1","NumContainers":0}
{"NodeManagerVersion":"2.8.3","Rack":"/default-rack","LastHealthUpdate":1519260876106,"HealthReport":"","State":"RUNNING","AvailableMemoryMB":1024,"NodeId":"datanode2:45454","UsedMemoryMB":0,"NodeHTTPAddress":"datanode2:8042","HostName":"datanode2","NumContainers":0}
{"NodeManagerVersion":"2.8.3","Rack":"/default-rack","LastHealthUpdate":1519568502251,"HealthReport":"","State":"RUNNING","AvailableMemoryMB":1024,"NodeId":"taskmanager3:45454","UsedMemoryMB":0,"NodeHTTPAddress":"taskmanager3:8042","HostName":"taskmanager3","NumContainers":0}
{"NodeManagerVersion":"2.8.3","Rack":"/default-rack","LastHealthUpdate":1519260871527,"HealthReport":"","State":"RUNNING","AvailableMemoryMB":1024,"NodeId":"datanode3:45454","UsedMemoryMB":0,"NodeHTTPAddress":"datanode3:8042","HostName":"datanode3","NumContainers":0}
{"NodeManagerVersion":"2.8.3","Rack":"/default-rack","LastHealthUpdate":1519568502259,"HealthReport":"","State":"RUNNING","AvailableMemoryMB":1024,"NodeId":"taskmanager2:45454","UsedMemoryMB":0,"NodeHTTPAddress":"taskmanager2:8042","HostName":"taskmanager2","NumContainers":0}
{"NodeManagerVersion":"2.8.3","Rack":"/default-rack","LastHealthUpdate":1519260875647,"HealthReport":"","State":"RUNNING","AvailableMemoryMB":1024,"NodeId":"datanode1:45454","UsedMemoryMB":0,"NodeHTTPAddress":"datanode1:8042","HostName":"datanode1","NumContainers":0}

EDIT:

First, I removed the JSONTokener from the above code. Second, for completeness I'm adding the following code that shows how to find the individual JSON objects within the sample string using a regex as originally asked.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class orgJson1Main {
    private static final String sample = "[{\"HostName\":\"taskmanager1\",\"Rack\":\"/default-rack\",\"State\":\"RUNNING\",\"NodeId\":\"taskmanager1:45454\",\"NodeHTTPAddress\":\"taskmanager1:8042\",\"LastHealthUpdate\":1519568501615,\"HealthReport\":\"\",\"NodeManagerVersion\":\"2.8.3\",\"NumContainers\":0,\"UsedMemoryMB\":0,\"AvailableMemoryMB\":1024},{\"HostName\":\"datanode2\",\"Rack\":\"/default-rack\",\"State\":\"RUNNING\",\"NodeId\":\"datanode2:45454\",\"NodeHTTPAddress\":\"datanode2:8042\",\"LastHealthUpdate\":1519260876106,\"HealthReport\":\"\",\"NodeManagerVersion\":\"2.8.3\",\"NumContainers\":0,\"UsedMemoryMB\":0,\"AvailableMemoryMB\":1024},{\"HostName\":\"taskmanager3\",\"Rack\":\"/default-rack\",\"State\":\"RUNNING\",\"NodeId\":\"taskmanager3:45454\",\"NodeHTTPAddress\":\"taskmanager3:8042\",\"LastHealthUpdate\":1519568502251,\"HealthReport\":\"\",\"NodeManagerVersion\":\"2.8.3\",\"NumContainers\":0,\"UsedMemoryMB\":0,\"AvailableMemoryMB\":1024},{\"HostName\":\"datanode3\",\"Rack\":\"/default-rack\",\"State\":\"RUNNING\",\"NodeId\":\"datanode3:45454\",\"NodeHTTPAddress\":\"datanode3:8042\",\"LastHealthUpdate\":1519260871527,\"HealthReport\":\"\",\"NodeManagerVersion\":\"2.8.3\",\"NumContainers\":0,\"UsedMemoryMB\":0,\"AvailableMemoryMB\":1024},{\"HostName\":\"taskmanager2\",\"Rack\":\"/default-rack\",\"State\":\"RUNNING\",\"NodeId\":\"taskmanager2:45454\",\"NodeHTTPAddress\":\"taskmanager2:8042\",\"LastHealthUpdate\":1519568502259,\"HealthReport\":\"\",\"NodeManagerVersion\":\"2.8.3\",\"NumContainers\":0,\"UsedMemoryMB\":0,\"AvailableMemoryMB\":1024},{\"HostName\":\"datanode1\",\"Rack\":\"/default-rack\",\"State\":\"RUNNING\",\"NodeId\":\"datanode1:45454\",\"NodeHTTPAddress\":\"datanode1:8042\",\"LastHealthUpdate\":1519260875647,\"HealthReport\":\"\",\"NodeManagerVersion\":\"2.8.3\",\"NumContainers\":0,\"UsedMemoryMB\":0,\"AvailableMemoryMB\":1024}]";

    public static void main(String[] args) {

        Matcher matcher = Pattern.compile("\\{[^}]*\\}").matcher(sample);
        while(matcher.find()){
            System.out.println(matcher.group());
        }
    }

}
like image 151
D.B. Avatar answered Dec 07 '25 20:12

D.B.


To split on }, {, but retain the curly brackets in the tokens, split on this regex:

"(?<=\\}), (?=\\{)"

Which uses a look behind and a look ahead to assert the curly brackets preceed and follow the comma, but not consume them in the split.

The whole line then becomes:

List<String> res = Arrays.asList(temp.replaceAll("^.|.$", "").split("(?<=\\}), (?=\\{)");

Note also the simplified trimming of leading [ and trailing ] but more-simply removing the first and last character in one operation.

like image 34
Bohemian Avatar answered Dec 07 '25 20:12

Bohemian