Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to ignore wrong fields when parsing a text-format protobuf message

I simulate a text-format file parsing with wrong field in c++.

My simple test .proto file:

$ cat settings.proto
package settings;
message Settings {
   optional int32  param1 = 1;
   optional string param2 = 2;
   optional bytes  param3 = 3;
}

My text-format file:

$ cat settings.txt
param1: 123
param: "some string"
param3: "another string"

I am parsing a file with the google::protobuf::TextFormat::Parser:

#include <iostream>
#include <fcntl.h>
#include <unistd.h>
#include <fstream>
#include <google/protobuf/text_format.h>
#include <google/protobuf/io/zero_copy_stream_impl.h>

#include <settings.pb.h>

using namespace std;

int main( int argc, char* argv[] )
{
    GOOGLE_PROTOBUF_VERIFY_VERSION;

    settings::Settings settings;

    int fd = open( argv[1], O_RDONLY );
    if( fd < 0 )
    {
        cerr << " Error opening the file " << endl;
        return false;
    }

    google::protobuf::io::finputStream finput( fd );
    finput.SetCloseOnDelete( true );

    google::protobuf::TextFormat::Parser parser;
    parser.AllowPartialMessage( true );

    if ( !parser.Parce( &finput, &settings ) )
    {
        cerr << "Failed to parse file!" << endl;
    }

    cout << settings.DebugString() << endl;

    google::protobuf::ShutdownProtobufLibrary();

    std::cout << "Exit" << std::endl;
    return true;
}

I set AllowPartialMessage to true for parser. All fields are optional. But currently Parse stops parsing after first wrong field. And after parsing "settings" contains only one first field.

Is there way to notify about fail and continue parsing another correct fields?

like image 817
dmiry Avatar asked Oct 19 '22 17:10

dmiry


1 Answers

The text-format parser does not permit unknown fields. Text-format is intended for communications with humans, and humans make typos. It's important to detect these typos rather than silently ignore them.

Usually, the reason to ignore unknown fields is for forwards-compatibility: then your program can (partially) understand messages written against future versions of the protocol with new fields. There are two particular use cases of this that I see a lot:

  • Systems that do machine-to-machine communication in text format. I recommend against this. Instead, use binary format, or if you really want your machine-to-machine communication to be textual, use JSON.

  • Systems where a human writes a text-format config file then distributes it to possibly-old servers in production. In this case, I recommend "pre-compiling" the text-format protobuf to binary using a tool run on the human's desktop, and then only ship the binary message to production servers. The local tool can easily be kept up-to-date and will be able to tell the human user if they misspelled a field name.

like image 63
Kenton Varda Avatar answered Nov 15 '22 11:11

Kenton Varda