Parse a substring as JSON using QJsonDocument

Question

I have a string which contains (not is) JSON-encoded data, like in this example:

foo([1, 2, 3], "some more stuff")
    |        |
  start     end   (of JSON-encoded data)

The complete language we use in our application nests JSON-encoded data, while the rest of the language is trivial (just recursive stuff). When parsing strings like this from left to right in a recursive parser, I know when I encounter a JSON-encoded value, like here the [1, 2, 3] starting at index 4. After parsing this substring, I need to know the end position to continue parsing the rest of the string.

I'd like to pass this substring to a well-tested JSON-parser like QJsonDocument in Qt5. But as reading the documentation, there is no possibility to parse only a substring as JSON, meaning that as soon as the parsed data ends (after consuming the ] here) control returns without reporting a parse error. Also, I need to know the end position to continue parsing my own stuff (here the remaining string is , "some more stuff")).

To do this, I used to use a custom JSON parser which takes the current position by reference and updates it after finishing parsing. But since it's a security-critical part of a business application, we don't want to stick to my self-crafted parser anymore. I mean there is QJsonDocument, so why not use it. (We already use Qt5.)

As a work-around, I'm thinking of this approach:

Let QJsonDocument parse the substring starting from the current position (which is no valid JSON)
The error reports an unexpected character, this is some position beyond the JSON
Let QJsonDocument parse again, but this time the substring with the correct end position

A second idea is to write a "JSON end scanner" which takes the whole string, a start position and returns the end position of the JSON-encoded data. This also requires parsing, as unmatched brackets / parentheses can appear in string values, but it should be much easier (and safer) to write (and use) such a class in comparison to a fully hand-crafted JSON-parser.

Does anybody have a better idea?

sehe · Accepted Answer

I rolled a quick parser[*] based on http://www.ietf.org/rfc/rfc4627.txt using Spirit Qi.

It doesn't actually parse into an AST, but it parses all of the JSON payload, which is actually a bit more than required here.

The sample here (http://liveworkspace.org/code/3k4Yor$2) outputs:

Non-JSON part of input starts after valid JSON: ', "some more stuff")'

Based on the test given by the OP:

const std::string input("foo([1, 2, 3], \"some more stuff\")");

// set to start of JSON
auto f(begin(input)), l(end(input));
std::advance(f, 4);

bool ok = doParse(f, l); // updates f to point after the start of valid JSON

if (ok) 
    std::cout << "Non-JSON part of input starts after valid JSON: '" << std::string(f, l) << "'
";

I have tested with several other more involved JSON documents (including multiline).

A few remarks:

I made the parser Iterator-based so it will likely easily work with Qt strings(?)
If you want to disallow multi-line fragments, change the skipper from qi::space to qi::blank
There is a conformance shortcut regarding number parsing (see TODO) that doesn't affect validity for this answer (see comment).

[*] technically, this is more of a parser stub since it doesn't translate into something else. It is basically a lexer taking on too much work :)

Full Code of sample:

// #define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

template <typename It, typename Skipper = qi::space_type>
    struct parser : qi::grammar<It, Skipper>
{
    parser() : parser::base_type(json)
    {
        // 2.1 values
        value = qi::lit("false") | "null" | "true" | object | array | number | string;

        // 2.2 objects
        object = '{' >> -(member % ',') >> '}';
        member = string >> ':' >> value;

        // 2.3 Arrays
        array = '[' >> -(value % ',') >> ']';

        // 2.4.  Numbers
        // Note out spirit grammar takes a shortcut, as the RFC specification is more restrictive:
        //
        // However non of the above affect any structure characters (:,{}[] and double quotes) so it doesn't
        // matter for the current purpose. For full compliance, this remains TODO:
        //
        //    Numeric values that cannot be represented as sequences of digits
        //    (such as Infinity and NaN) are not permitted.
        //     number = [ minus ] int [ frac ] [ exp ]
        //     decimal-point = %x2E       ; .
        //     digit1-9 = %x31-39         ; 1-9
        //     e = %x65 / %x45            ; e E
        //     exp = e [ minus / plus ] 1*DIGIT
        //     frac = decimal-point 1*DIGIT
        //     int = zero / ( digit1-9 *DIGIT )
        //     minus = %x2D               ; -
        //     plus = %x2B                ; +
        //     zero = %x30                ; 0
        number = qi::double_; // shortcut :)

        // 2.5 Strings
        string = qi::lexeme [ '"' >> *char_ >> '"' ];

        static const qi::uint_parser<uint32_t, 16, 4, 4> _4HEXDIG;

        char_ = ~qi::char_("\"\") |
               qi::char_("\x5C") >> (       // \ (reverse solidus)
                   qi::char_("\x22") |      // "    quotation mark  U+0022
                   qi::char_("\x5C") |      // \    reverse solidus U+005C
                   qi::char_("\x2F") |      // /    solidus         U+002F
                   qi::char_("\x62") |      // b    backspace       U+0008
                   qi::char_("\x66") |      // f    form feed       U+000C
                   qi::char_("\x6E") |      // n    line feed       U+000A
                   qi::char_("\x72") |      // r    carriage return U+000D
                   qi::char_("\x74") |      // t    tab             U+0009
                   qi::char_("\x75") >> _4HEXDIG )  // uXXXX                U+XXXX
               ;

        // entry point
        json = value;

        BOOST_SPIRIT_DEBUG_NODES(
                (json)(value)(object)(member)(array)(number)(string)(char_));
    }

  private:
    qi::rule<It, Skipper> json, value, object, member, array, number, string;
    qi::rule<It> char_;
};

template <typename It>
bool tryParseAsJson(It& f, It l) // note: first iterator gets updated
{
    static const parser<It, qi::space_type> p;

    try
    {
        return qi::phrase_parse(f,l,p,qi::space);
    } catch(const qi::expectation_failure<It>& e)
    {
        // expectation points not currently used, but we could tidy up the grammar to bail on unexpected tokens
        std::string frag(e.first, e.last);
        std::cerr << e.what() << "'" << frag << "'
";
        return false;
    }
}

int main()
{
#if 0
    // read full stdin
    std::cin.unsetf(std::ios::skipws);
    std::istream_iterator<char> it(std::cin), pte;
    const std::string input(it, pte);

    // set up parse iterators
    auto f(begin(input)), l(end(input));
#else
    const std::string input("foo([1, 2, 3], \"some more stuff\")");

    // set to start of JSON
    auto f(begin(input)), l(end(input));
    std::advance(f, 4);
#endif

    bool ok = tryParseAsJson(f, l); // updates f to point after the end of valid JSON

    if (ok) 
        std::cout << "Non-JSON part of input starts after valid JSON: '" << std::string(f, l) << "'
";
    return ok? 0 : 255;
}

Parse a substring as JSON using QJsonDocument

Tags:

c++

json

parsing

qt

qt5

leemes

1 Answers

Full Code of sample:

sehe

Recent Activity

Donate For Us

Parse a substring as JSON using QJsonDocument

Tags:

c++

json

parsing

qt

qt5

leemes

1 Answers

Full Code of sample:

sehe

Related questions

Recent Activity

Donate For Us