Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I parse JSON sprinkled unpredictably into a string?

Suppose that I've got a node.js application that receives input in a weird format: strings with JSON arbitrarily sprinkled into them, like so:

This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text

I have a couple guarantees about this input text:

  1. The bits of literal text in between the JSON objects are always free from curly braces.

  2. The top level JSON objects shoved into the text are always object literals, never arrays.

My goal is to split this into an array, with the literal text left alone and the JSON parsed out, like this:

[
    "This is a string ",
    {"with":"json","in":"it"},
    " followed by more text ",
    {"and":{"some":["more","json"]}},
    " and more text"
]

So far I've written a naive solution that simply counts curly braces to decide where the JSON starts and stops. But this wouldn't work if the JSON contains strings with curly braces in them {"like":"this one } right here"}. I could try to get around that by doing similar quote counting math, but then I also have to account for escaped quotes. At that point it feels like I'm redoing way too much of JSON.parse's job. Is there a better way to solve this problem?

like image 909
R. Davidson Avatar asked Feb 09 '19 16:02

R. Davidson


People also ask

How can I convert JSON to string?

Use the JavaScript function JSON.stringify() to convert it into a string. const myJSON = JSON.stringify(obj); The result will be a string following the JSON notation.

How do you parse a string of JSON response?

Example - Parsing JSONUse the JavaScript function JSON.parse() to convert text into a JavaScript object: const obj = JSON.parse('{"name":"John", "age":30, "city":"New York"}'); Make sure the text is in JSON format, or else you will get a syntax error.

Can you parse a JSON object?

parse() JSON parsing is the process of converting a JSON object in text format to a Javascript object that can be used inside a program. In Javascript, the standard way to do this is by using the method JSON. parse() , as the Javascript standard specifies.

How do I access parsed JSON data?

To access the JSON object in JavaScript, parse it with JSON. parse() , and access it via “.” or “[]”.


2 Answers

You can check if JSON.parse throws an error to determine if the chunk is a valid JSON object or not. If it throws an error then the unquoted } are unbalanced:

const tests = [
  '{"just":"json }}{}{}{{[]}}}}","x":[1,2,3]}',
  'Just a string',
  'This string has a tricky case: {"like":"this one } right here"}',
  'This string {} has a tiny JSON object in it.',
  '.{}.',
  'This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text',
];

tests.forEach( test => console.log( parse_json_interleaved_string( test ) ) );

function parse_json_interleaved_string ( str ) {
  const chunks = [ ];
  let last_json_end_index = -1;
  let json_index = str.indexOf( '{', last_json_end_index + 1 );
  for ( ; json_index !== -1; json_index = str.indexOf( '{', last_json_end_index + 1 ) ) {

    // Push the plain string before the JSON
    if ( json_index !== last_json_end_index + 1 )
        chunks.push( str.substring( last_json_end_index, json_index ) );

    let json_end_index = str.indexOf( '}', json_index + 1 );

    // Find the end of the JSON
    while ( true ) {
       try { 
         JSON.parse( str.substring( json_index, json_end_index + 1 ) );
         break;
       } catch ( e ) {
         json_end_index = str.indexOf( '}', json_end_index + 1 );
         if ( json_end_index === -1 )
           throw new Error( 'Unterminated JSON object in string' );
       }
    }

    // Push JSON
    chunks.push( str.substring( json_index, json_end_index + 1 ) );
    last_json_end_index = json_end_index + 1;
  }

  // Push final plain string if any
  if ( last_json_end_index === - 1 )
    chunks.push( str );
  else if ( str.length !== last_json_end_index )
    chunks.push( str.substr( last_json_end_index ) );

  return chunks;
}
like image 169
Paul Avatar answered Oct 11 '22 17:10

Paul


Here's a comparatively simple brute-force approach: split the whole input string on curly braces, then step through the array in order. Whenever you come across an open brace, find the longest chunk of the array from that starting point that successfully parses as JSON. Rinse and repeat.

This will not work if the input contains invalid JSON and/or unbalanced braces (see the last two test cases below.)

const tryJSON = input => {
  try {
    return JSON.parse(input);
  } catch (e) {
    return false;
  }
}

const parse = input => {
  let output = [];
  let chunks = input.split(/([{}])/);

  for (let i = 0; i < chunks.length; i++) {
    if (chunks[i] === '{') {
      // found some possible JSON; start at the last } and backtrack until it works.
      for (let j = chunks.lastIndexOf('}'); j > i; j--) {
        if (chunks[j] === '}') {
          // Does it blend?
          let parsed = tryJSON(chunks.slice(i, j + 1).join(""))
          if (parsed) {
            // it does! Grab the whole thing and skip ahead
            output.push(parsed);
            i = j;
          }
        }
      }
    } else if (chunks[i]) {
      // neither JSON nor empty
      output.push(chunks[i])
    }
  }

  console.log(output)
  return output
}

parse(`{"foo": "bar"}`)
parse(`test{"foo": "b}ar{{[[[{}}}}{}{}}"}`)
parse(`this {"is": "a st}ri{ng"} with {"json": ["in", "i{t"]}`)
parse(`{}`)
parse(`this {"i{s": invalid}`)
parse(`So is {this: "one"}`)
like image 40
Daniel Beck Avatar answered Oct 11 '22 16:10

Daniel Beck