Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract JSON from string in .NET

The input string is mix of some text with valid JSON:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<TITLE>Title</TITLE>

<META http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<META HTTP-EQUIV="Content-language" CONTENT="en">
<META HTTP-EQUIV="keywords" CONTENT="search words">
<META HTTP-EQUIV="Expires" CONTENT="0">

<script SRC="include/datepicker.js" LANGUAGE="JavaScript" TYPE="text/javascript"></script>
<script SRC="include/jsfunctions.js" LANGUAGE="JavaScript" TYPE="text/javascript"></script>

<link REL="stylesheet" TYPE="text/css" HREF="css/datepicker.css">

<script language="javascript" type="text/javascript">
function limitText(limitField, limitCount, limitNum) {
    if (limitField.value.length > limitNum) {
        limitField.value = limitField.value.substring(0, limitNum);
    } else {
        limitCount.value = limitNum - limitField.value.length;
    }
}
</script>
{"List":[{"ID":"175114","Number":"28992"]}

The task is to deserialize the JSON part of it into some object. The string can begin with some text, but it surely contains the valid JSON. I've tried to use JSON validation REGEX, but there was a problem parsing such pattern in .NET.
So in the end I'd wanted to get only:

{
    "List": [{
        "ID": "175114",
        "Number": "28992"
    }]
}

Clarification 1:
There is only single JSON object in whole the messy string, but the text can contain {}(its actually HTML and can contain javascripts with <script> function(){..... )

like image 635
0x49D1 Avatar asked Nov 01 '16 13:11

0x49D1


2 Answers

You can use this method

    public object ExtractJsonObject(string mixedString)
    {
        for (var i = mixedString.IndexOf('{'); i > -1; i = mixedString.IndexOf('{', i + 1))
        {
            for (var j = mixedString.LastIndexOf('}'); j > -1; j = mixedString.LastIndexOf("}", j -1))
            {
                var jsonProbe = mixedString.Substring(i, j - i + 1);
                try
                {
                    return JsonConvert.DeserializeObject(jsonProbe);
                }
                catch
                {                        
                }
            }
        }
        return null;
    }

The key idea is to search all { and } pairs and probe them, if they contain valid JSON. The first valid JSON occurrence is converted to an object and returned.

like image 87
Ralf Bönning Avatar answered Oct 23 '22 22:10

Ralf Bönning


Use regex to find all possible JSON structures:

\{(.|\s)*\}

Regex example

Then iterate all these matches unitil you find a match that will not cause an exception:

JsonConvert.SerializeObject(match);

If you know the format of the JSON structure, use JsonSchema.

like image 45
Chrille Avatar answered Oct 23 '22 20:10

Chrille