Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient way to fix an invalid JSON

I am stuck in an impossible situation. I have a JSON from outer space (there is no way they are going to change it). Here is the JSON

{
    user:'180111',
    title:'I\'m sure "E pluribus unum" means \'Out of Many, One.\' \n\nhttp://en.wikipedia.org/wiki/E_pluribus_unum.\n\n\'',
    date:'2007/01/10 19:48:38',
    "id":"3322121",
    "previd":112211,
    "body":"\'You\' can \"read\" more here [url=http:\/\/en.wikipedia.org\/?search=E_pluribus_unum]E pluribus unum[\/url]'s. Cheers \\*/ :\/",
    "from":"112221",
    "username":"mikethunder",
    "creationdate":"2007\/01\/10 14:04:49"
}

"It is nowhere near a valid JSON",I said. And their response was "emmm! but Javascript can read it without complain":

<html>
<script type="text/javascript">
    var obj = {"PUT JSON FROM UP THERE HERE"};

    document.write(obj.title);
    document.write("<br />");
    document.write(obj.creationdate + " " + obj.date);
    document.write("<br />");
    document.write(obj.body);
    document.write("<br />");
</script>
<body>
</body>
</html>

Problem

I am supposed to read and parse this string via .NET(4) and it broke 3 out of 14 library mentioned in C# section of Json.org (didn't try rest of them). To make the problem go away, I wrote following function to fix the issue with single and double quotes.

public static string JSONBeautify(string InStr){
    bool inSingleQuote = false;
    bool inDoubleQuote = false;
    bool escaped = false;

    StringBuilder sb = new StringBuilder(InStr);
    sb = sb.Replace("`", "<°)))><"); // replace all instances of "grave accent" to "fish" so we can use that mark later. 
                                        // Hopefully there is no "fish" in our JSON
    for (int i = 0; i < sb.Length; i++) {
        switch (sb[i]) {

            case '\\':
                if (!escaped)
                    escaped = true;
                else 
                    escaped = false;
                break;
            case '\'':
                if (!inSingleQuote && !inDoubleQuote) {
                    sb[i] = '"';            // Change opening single quote string markers to double qoute
                    inSingleQuote = true;
                } else if (inSingleQuote && !escaped) {
                    sb[i] = '"';            // Change closing single quote string markers to double qoute
                    inSingleQuote = false;
                } else if (escaped) {
                    escaped = false;
                }
                break;
            case '"':
                if (!inSingleQuote && !inDoubleQuote) {
                    inDoubleQuote = true;   // This is a opening double quote string marker
                } else if (inSingleQuote && !escaped) {
                    sb[i] = '`';            // Change unescaped double qoute to grave accent
                } else if (inDoubleQuote && !escaped) {
                    inDoubleQuote = false; // This is a closing double quote string marker
                } else if (escaped) {
                    escaped = false;
                }
                break;
            default:
                escaped = false;
                break;
        }
    }
    return sb.ToString()
        .Replace("\\/", "/")        // Remove all instances of escaped / (\/) .hopefully no smileys in string
        .Replace("`", "\\\"")       // Change all "grave accent"s to escaped double quote \"
        .Replace("<°)))><", "`")   // change all fishes back to "grave accent"
        .Replace("\\'","'");        // change all escaped single quotes to just single quote
}

Now JSONlint only complains about attribute names and I can use both JSON.NET and SimpleJSON libraries to parse above JSON.

Question

I am sure my code is not the best way of fixing mentioned JSON. Is there any scenario that my code might break? Is there a better way of doing this?

like image 296
AaA Avatar asked Feb 07 '15 14:02

AaA


2 Answers

You need to run this through JavaScript. Fire up a JavaScript parser in .net. Give the string as input to JavaScript and use JavaScript's native JSON.stringify to convert:

obj = {
    "user":'180111',
    "title":'I\'m sure "E pluribus unum" means \'Out of Many, One.\' \n\nhttp://en.wikipedia.org/wiki/E_pluribus_unum.\n\n',
    "date":'2007/01/10 19:48:38',
    "id":"3322121",
    "previd":"112211",
    "body":"\'You\' can \"read\" more here [url=http:\/\/en.wikipedia.org\/?search=E_pluribus_unum]E pluribus unum[\/url]'s. Cheers \\*/ :\/",
    "from":"112221",
    "username":"mikethunder",
    "creationdate":"2007\/01\/10 14:04:49"
}

console.log(JSON.stringify(obj));
document.write(JSON.stringify(obj));

Please remember that the string (or rather object) you've got isn't valid JSON and can't be parsed with a JSON library. It needs to be converted to valid JSON first. However it's valid JavaScript.

To complete this answer: You can use JavaScriptSerializer in .Net. For this solution you'll need the following assemblies:

  • System.Net
  • System.Web.Script.Serialization

    var webClient = new WebClient();
    string readHtml = webClient.DownloadString("uri to your source (extraterrestrial)");
    var a = new JavaScriptSerializer();
    
    Dictionary<string, object> results = a.Deserialize<Dictionary<string, object>>(readHtml);
    
like image 162
Mouser Avatar answered Sep 28 '22 09:09

Mouser


How about this:

 string AlienJSON = "your alien JSON";
 JavaScriptSerializer js = new JavaScriptSerializer();
 string ProperJSON = js.Serialize(js.DeserializeObject(AlienJSON));

Or just consume the object after deserialize instead of converting it back to string and passing it to a JSON parser for extra headache

As Mouser also mentioned you need to use System.Web.Script.Serialization which is available by including system.web.extensions.dll in your project and to do that you need to change Target framework in project properties to .NET Framework 4.

EDIT

Trick to consume deserialized object is using dynamic

JavaScriptSerializer js = new JavaScriptSerializer();
dynamic obj = js.DeserializeObject(AlienJSON);

for JSON in your question simply use

string body = obj["body"];

or if your JSON is an array

if (obj is Array) {
    foreach(dynamic o in obj){
        string body = obj[0]["body"];
        // ... do something with it
    }
}
like image 45
Bistro Avatar answered Sep 28 '22 09:09

Bistro